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Abstract 

This  paper  shows  that  the  elimination  of  fault-agnostic  instability,  the  instability  caused  by  fault- 
agnostic  distributed  control,  substantially  improves  BGP  convergence  speed.  To  this  end,  we  first  clas¬ 
sify  BGP  convergence  instability  into  two  categories :  fault-agnostic  instability  and  distribution-inherent 
instability;  secondly,  we  prove  that  it  is  impossible  to  eliminate  all  distribution-inherent  instability  in 
any  distributed  routing  protocol;  thirdly,  we  design  the  Grapevine  Border  Gateway  Protocol  (G-BGP)  to 
show  that  all  fault-agnostic  instability  can  be  eliminated.  G-BGP  eliminates  all  fault-agnostic  instabil¬ 
ity  under  different  fault  and  routing  policy  scenarios  by  (i)  piggybacking  onto  BGP  UPDATE  messages 
fine-grained  information  about  faults  to  the  nodes  affected  by  the  faults,  (ii)  rejecting  obsolete  fault 
information,  and  (iii)  quickly  resolving  the  uncertainty  between  link  and  node  failure  as  well  as  the 
uncertainty  of  whether  a  node  has  changed  route. 

We  evaluate  G-BGP  by  both  analysis  and  simulation.  Analytically,  we  prove  that,  by  eliminating 
fault-agnostic  instability,  G-BGP  achieves  optimal  convergence  speed  in  several  scenarios  where  BGP 
convergence  is  severely  delayed  (e.g.,  when  a  node  or  a  link  fail-stops),  and  when  the  shortest-path-first 
policy  is  used,  G-BGP  asymptotically  improves  BGP  convergence  speed  except  in  scenarios  where  BGP 
convergence  speed  is  already  optimal  (e.g.,  when  a  node  or  a  link  joins).  By  simulating  networks  with 
up  to  1 15  autonomous  systems,  we  observe  that  G-BGP  improves  BGP  convergence  stability  and  speed 
by  factors  of  29.4  and  10.2  respectively. 

Keywords:  BGP,  fault-agnostic  instability,  distribution-inherent  instability,  convergence  speed,  path- 
vector  routing 


‘This  work  was  partially  sponsored  by  DARPA  contract  OSU-RF  #F33615-01-C-1901,  NSF  grant  NSF-CCR-9972368,  an 
Ameritech  Faculty  Fellowship,  and  two  grants  from  Microsoft  Research. 

tEmail:  {zhangho,  anish,  liuzh}@cis.ohio-state.edu;  Tel:  +l-614-292-{1932,  1836,  7344};  Fax:  +1-614-292-2911;  Web: 
http://www.cis.ohio-state.edu/{“zhangho,  “anish,  “liuzh}. 


1  Introduction 


The  Border  Gateway  Protocol  (BGP)  is  used  to  coordinate  routing  among  autonomous  systems  (simply 
called  ASes  hereafter)  in  the  Internet  [15].  Theoretically,  BGP  does  not  guarantee  convergence  and  allows 
persistent  route  oscillations  in  several  scenarios,  such  as  conflicting  routing  policies  and  improper  IBGP 
configurations  [4,  12,  13],  In  practice,  however,  most  ASes  in  the  Internet  use  the  shortest-path-first  route 
ranking  policy  (whereby  a  path  with  the  least  hop  count  is  chosen),  and  as  a  result,  BGP  converges  with  high 
probability  [18,  25].  In  addition,  for  cases  where  the  shortest-path-first  policy  is  not  used,  solutions  have 
been  proposed  to  avoid  persistent  route  oscillations  in  BGP  [4,  9,  12,  13]. 

Our  problem  of  interest,  therefore,  is  the  scenario  where  BGP  does  converge,  but  its  convergence  exhibits 
instability  (i.e.,  allowing  unnecessary  route  changes)  and  is  potentially  slow  (e.g.,  taking  up  to  15  minutes 
after  the  disconnection  of  a  single  AS  [17,  18]).  Instability  during  convergence  is  undesirable.  First,  it 
increases  the  probability  of  message  reordering,  which  is  not  only  undesirable  for  multimedia  applications, 
but  also  increases  the  probability  of  undesirable  timer  expiration  in  protocols  such  as  TCP  and  IP.  Second, 
instability  increases  delay  jitter  in  packet  delivery.  And  finally,  instability  increases  packet  loss  (e.g.,  due 
to  TTL  expiration)  [17].  Slow  convergence  of  BGP  is  also  undesirable,  because  it  not  only  deteriorates 
packet  delivery,  but  also  amplifies  the  effect  of  BGP-related  problems  under  stressful  conditions  such  as  the 
Code  Red/Nimda  attack  [26].  Moreover,  the  two  sub-problems  are  related:  the  interaction  of  unstable  BGP 
convergence  and  the  BGP  route  flap  damping,  for  instance,  can  delay  BGP  convergence  further,  in  addition 
to  entailing  loss  of  reachability  for  hours  [20]. 

Related  work.  To  improve  BGP  convergence  speed,  the  methods  of  “consistency  assertions”  [23]  and 
“ghost  flushing”  [6]  have  been  proposed.  The  former  captures  consistency  properties  between  neighboring 
nodes,  and  the  latter  withdraws  old  routes  faster  than  propagating  new  routes.  However,  “consistency  asser¬ 
tions”  do  not  deal  with  slow  BGP  convergence  that  results  from  inconsistency  between  nodes  multiple  hops 
away,  and  neither  of  the  two  approaches  can  remove  all  convergence-malign  instability,  the  major  cause  for 
slow  BGP  convergence  (to  be  discussed  in  Section  3.1).  Moreover,  the  nature  of  different  types  of  instability 
during  BGP  convergence,  the  fundamental  limits  on  improving  BGP  convergence  stability  and  speed,  and 
the  impact  of  fault  types  and  routing  policies  on  protocol  convergence  behaviors  arc  not  the  focus  of  [23] 
and  [6],  In  addition,  the  “consistency  assertions”  method  propagates  the  entry-router-id,  which  is  essentially 
an  attribute  below  instead  of  at  the  level  of  ASes,  of  one  AS  to  other  ASes,  thus  local  changes  of  entry-router 
within  an  AS  (even  when  the  AS-path  does  not  change)  will  propagate  to  other  ASes,  which  arc  potentially 
far  away,  and  leads  to  propagation  of  unnecessary  route-changes. 

An  application-layer  approach,  resilient  overlay  networks  [2],  is  also  proposed  to  deal  with  slow  BGP 
convergence.  But  the  approach  is  not  scalable  in  the  sense  that  each  node  in  a  network  maintains  information 
about  the  whole  network,  and  the  approach  does  not  improve  the  convergence  behaviors  of  BGP. 

In  [17]  and  [18],  the  delayed  BGP  convergence  and  the  impact  of  Internet  policy  as  well  as  topology 
on  BGP  convergence  arc  studied.  But  the  study  considers  only  the  shortest-path-first  policy,  and  does  not 
consider  scenarios  where  a  node  or  a  link  joins.  In  [22],  a  real-time  model  for  BGP  convergence  is  proposed, 
but  the  analysis  only  considers  the  case  when  a  destination  node  joins  a  network.  Moreover,  no  solution  for 
improving  BGP  convergence  behavior  is  proposed  in  [17],  [18]  and  [22], 

Contributions  of  the  paper.  We  study  the  nature  of  instability  during  BGP  convergence  and  classify  the 
instability  into  two  categories:  fault-agnostic  instability  and  distribution-inherent  instability.  Fault- agnostic 
instability  is  the  major  cause  for  slow  BGP  convergence,  and  distribution-inherent  instability  is  intrinsic  to 
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distributed  protocols.  To  understand  the  fundamental  limit  on  improving  BGP  convergence,  we  prove  that  it 
is  impossible  to  eliminate  all  distribution-inherent  instability  in  any  stateful  distributed  routing  protocol. 

Then,  we  refine  BGP  to  obtain  a  new  protocol  G-BGP  (for  Grapevine-BGP )  that  eliminates  all  fault- 
agnostic  instability.  G-BGP  achieves  this  via  three  mechanisms:  First,  it  propagates  fine-grained  information 
about  faults  to  the  nodes  that  arc  affected  by  the  faults;  Second,  it  rejects  obsolete  fault  information,  by 
enforcing  a  total  order  on  all  the  fault  information  regarding  the  same  AS  that  is  sent  out  from  the  AS 
at  different  time;  Third,  it  quickly  resolves  the  uncertainty  between  link  and  node  failure  as  well  as  the 
uncertainty  of  whether  a  node  has  changed  route. 

Towards  analyzing  the  convergence  behaviors  of  G-BGP  and  BGP,  we  introduce  policy  graphs  as  a 
modelling  tool  for  inter-AS  routing.  Using  policy  graphs,  we  prove  that  G-BGP  eliminates  all  fault-agnostic 
instability  under  different  fault  and  routing  policy  scenarios.  We  also  prove  that,  by  eliminating  fault- 
agnostic  instability,  G-BGP  converges  at  an  asymptotically  optimal  speed  in  several  scenarios  where  BGP 
convergence  is  severely  delayed  (e.g.,  when  a  node  or  a  link  fail-stops),  and  when  the  shortest-path-first 
policy  is  used,  G-BGP  asymptotically  improves  BGP  convergence  speed  except  in  scenarios  where  BGP 
convergence  speed  is  already  optimal  (e.g.,  when  a  node  or  a  link  joins). 

We  also  evaluate  G-BGP  by  simulation  with  realistic  Internet-type  network  topologies.  The  simulation 
shows  that,  for  networks  with  up  to  115  ASes,  G-BGP  improves  BGP  convergence  stability  and  speed  by 
factors  of  29.4  and  10.2  respectively,  and  the  improvement  in  G-BGP  increases  as  network  size  increases. 
The  simulation  also  shows  that,  when  routing  policies  other  than  “shortest-path-first”  arc  used,  G-BGP 
improves  BGP  convergence  stability  and  speed  in  all  fault  scenarios. 

Moreover,  G-BGP  is  scalable  along  a  number  of  dimensions.  First,  fault  information,  most  of  which 
is  piggybacked  in  UPDATE  messages,  consumes  little  network  bandwidth,  and  fault  information  is  either 
not  stored  or  only  temporarily  stored  at  nodes.  Second,  the  degree  of  improvement  in  G-BGP  increases 
as  network  size  increases.  Third,  G-BGP  does  not  expose  additional  intra-AS  attributes  and  thus  does  not 
introduce  additional  instability  that  is  due  to  local  state  changes  within  an  AS.  And  finally,  each  node  only 
maintains  routes  of  its  immediate  neighbors. 

Organization  of  the  paper.  In  Section  2,  we  present  the  network  model,  fault  model,  as  well  as  protocol 
notation,  and  we  briefly  describe  BGP.  In  Section  3,  we  study  the  nature  of  BGP  convergence  instability  and 
present  the  G-BGP  design.  Then,  we  introduce  policy  graph  in  Section  4,  and  in  Section  5,  we  analyze  the 
convergence  stability  as  well  as  speed  in  G-BGP  and  BGP.  We  present  our  simulation  results  in  Section  6.  In 
Section  7,  we  discuss  the  implementation  as  well  as  deployment  considerations  of  G-BGP,  and  we  discuss 
approaches  to  reducing  distribution-inherent  instability.  Section  8  concludes  the  paper. 

2  Preliminaries 

In  this  section,  we  present  the  network  model,  fault  model,  and  protocol  notation.  We  also  briefly  describe 
the  Border  Gateway  Protocol  (BGP). 

Network  model.  A  network  G  is  an  undirected  graph  (V,  E,  P),  where  V  and  E  are  the  set  of  nodes  (i.e., 
BGP  speakers)  and  the  set  of  links  in  the  network  respectively,  and  P  is  the  function  that  defines  the  routing 
policies  of  each  node.  V  is  divided  into  several  subsets,  each  of  which  is  an  AS;  nodes  within  the  same  AS 
arc  connected  (AS  partition  is  discussed  in  Section  7).  Each  node  has  a  unique  node-id,  and  all  the  nodes 
in  the  same  AS  have  the  same  AS-id.  For  a  node  i,  the  id  of  its  AS  is  denoted  by  i.AS.  For  any  two  nodes 
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i  and  j,  (i.  j)  is  in  E  if  i  and  j  can  communicate  with  each  other  directly,  or  if  i  and  j  arc  in  the  same  AS. 
For  any  two  ASes  1  and  J,  there  is  a  channel  (X,  J)  between  X  and  J  if  there  exist  two  nodes  i  and  j  such 
that  i  €  X,  j  G  J,  and  (i,  j)  €  E.  For  an  AS  X  and  a  node  j,  X  is  a  neighboring  AS  of  j  if  there  is  a  channel 
between  X  and  j.AS.  For  a  node  i,  its  neighboring  node  j  is  an  internal  neighbor  if  j  is  in  the  same  AS  as 
i;  otherwise,  j  is  an  external  neighbor  of  i. 

Message  transmission  between  nodes  is  reliable,  and  message  passing  delay  across  a  link  is  bounded 
from  below  and  from  above  by  L,i  and  Ud  respectively. 

There  is  a  clock  at  each  node.  The  ratio  of  clock  rates  between  any  two  nodes  is  bounded  from  above 
by  a,  but  no  extra  constraint  on  the  absolute  values  of  clocks  is  enforced,  (a  tends  to  be  quite  small,  given 
today’s  high-precision  clocks.) 

For  clarity  of  presentation,  we  only  consider  one  destination  d,  an  address  prefix  representing  a  set  of 
nodes  in  an  AS  d.AS.  (Our  protocol  readily  applies  to  other  destinations.) 

Fault  model.  A  node  or  a  link  is  up  if  it  functions  correctly,  and  it  is  down  if  it  fail-stops.  In  a  network,  an 
up  node  or  link  can  fail-stop  and  become  down;  a  down  node  or  link  can  become  up  and  join  the  network; 
routing  policies  of  ASes  can  change.  A  channel  (X,  J)  is  up  if  there  is  at  least  one  up  link  between  ASes 
X  and  J\  otherwise,  the  channel  is  down.  An  AS  is  up  if  there  is  at  least  one  up  node  in  the  AS;  otherwise, 
the  AS  is  down. 

The  fail-stop  of  a  node  is  divided  into  two  categories:  graceful  fail-stop  where  a  node  announces  to  its 
neighbors  when  it  fail-stops,  and  gross  fail-stop  where  a  node  fail-stops  silently.  An  AS  fail-stops  gracefully 
if  all  the  nodes  in  it  fail-stop  gracefully. 

Due  to  faults,  a  network  G  may  change  dynamically  in  the  sense  that  its  topology  or  routing  policy 
function  changes,  where  the  topology  of  G  is  the  subgraph  G'  (V .  E')  of  G(V,  E)  such  that  V  =  {i  :  i  G 
V  A  i  is  up}  and  E'  =  :  i  G  V'  A  j  G  V'  A  (i,  j)  G  E  A  (i,j)  is  up}.  To  reflect  changes  in  network 

topology  and  routing  policy  function,  we  regal'd  the  state  of  G  as  the  union  of  the  network  topology,  the 
routing  policy  function,  and  the  state  of  all  the  up  nodes,  with  the  state  of  a  node  being  the  values  of  the 
variables  maintained  at  the  node.  At  a  network  state  q,  the  network  topology  and  the  route  of  a  node  i  are 
denoted  by  G.q(V.q ,  E.q)  and  i.AS-path.q  respectively. 

Protocol  notation.  We  write  protocols  using  the  guarded  command  notation  [10].  At  each  node,  the 
protocol  consists  of  a  finite  set  of  variables  and  actions.  Each  action  consists  of  two  parts:  guard  and 
statement.  For  convenience,  we  associate  a  unique  name  with  each  action.  Thus,  an  action  has  the  following 
form: 

(name)  ::  (guard)  — >  (statement) 

The  guard  is  either  a  boolean  expression  over  the  protocol  variables  of  the  node  or  a  message  reception  oper¬ 
ation;  the  statement  updates  zero  or  more  protocol  variables  of  the  node,  and/or  sends  out  some  message(s). 
An  action  is  enabled  if  its  guard  evaluates  to  true.  An  action  is  executed  only  if  it  is  enabled.  To  execute  an 
action,  its  statement  is  executed  atomically. 

Border  Gateway  Protocol  (BGP).  In  BGP,  UPDATE  messages  are  passed  between  nodes  to  convey  routing 
information.  To  reduce  instability,  BGP  employs  a  MRAI  timer  (which  is  30  seconds  by  default)  such  that 
a  node  sends  out  at  most  one  non-withdrawal  UPDATE  message  within  any  MRAI  time.  Two  neighboring 
nodes  also  periodically  exchange  Keep-Alive  messages  to  monitor  the  state  of  the  link  between  them. 
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BGP  UPDATES  arc  route  records  that  include  the  following  attributes  (among  others): 


nlri 

next-hop 
AS -path 

locaLpref 

med 


network  layer  reachability  information  (i.e.,  the  destination  address); 
the  next  hop; 

ordered  list  of  ASes  traversed,  with  more-recently-visited  ASes  placed 
in  front  of  less-recently-visited  ASes; 
local  preference; 
multi-exit  discriminator. 


Each  route  r  is  associated  with  a  3-tuple  ranker),  defined  as  (r. locaLpref,  |r  aslpath\  ■>  r  next  hop )•  F°r 
destination  d,  a  node  i  chooses  its  route  via  the  following  two  steps  [24] : 

•  First,  among  all  the  routes  learned  from  a  neighboring  AS,  i  only  considers  the  route  with  the  lowest 
med  value; 

•  Second,  for  all  the  routes  to  be  considered,  i  ranks  them  in  lexical  order  by  rank(-),  and  i  selects  as 
its  route  the  one  with  the  highest  rank. 

[24].  Given  a  route  r  available  to  a  node  i,  attribute  r.locaLpref  is  determined  by  the  route  ranking 
policy  of  i.  We  call  the  ranking  policy  that  assigns  r.locaLpref  to  a  constant  value  the  shortest-path-first 
policy  or  the  SPF policy ,  where  a  route  with  the  shortest  AS-path  ranks  the  highest.  We  call  a  route  ranking 
policy  other  than  the  SPF  policy  a  non-SPF  policy. 

Besides  route  ranking  policy,  routing  policies  such  as  export  and  import  policies  arc  used  in  BGP.  The 
export  policy  of  a  node  i  defines  the  set  of  export  neighbors  of  i  to  which  i  announces  its  route;  the  import 
policy  of  i  defines  the  set  of  import  neighbors  of  i  whose  routes  arc  accepted  by  i.  If  a  node  i  exports  routes 
to  or  imports  routes  from  a  node  in  a  neighboring  AS  J ,  we  say,  for  convenience,  i  exports  routes  to  or 
imports  routes  from  J  respectively.  It  is  recommended  as  well  as  the  common-practice  that  nodes  within 
the  same  AS  share  the  same  routing  policies  [14,  24], 


3  The  G-BGP  protocol 

The  objective  of  this  paper  is  to  design  a  protocol  that,  given  a  network  and  a  destination  where  BGP 
converges  in  the  presence  of  faults,  reduces  the  number  of  route  changes  during  BGP  convergence,  as  well  as 
the  time  taken  for  BGP  to  converge.  To  achieve  the  objective,  we  first  study  the  nature  of  BGP  convergence 
instability  and  its  relationship  to  BGP  convergence  speed;  we  then  design  protocol  G-BGP  to  improve  BGP 
convergence  stability  and  speed. 

3.1  Instability  during  BGP  convergence 

We  identify  fault-agnostic  instability  and  distribution-inherent  instability,  analyze  their  causes,  and  discuss 
their  relationship  with  BGP  convergence  speed. 

Fault-agnostic  instability.  Fault-agnostic  instability  is  the  type  of  instability  that  is  incurred  at  a  node 
which  adopts  an  invalid  route  even  though  some  information  regarding  the  fault  that  invalidates  the  route 
has  reached  the  node.  Fault-agnostic  instability  and  its  propagation  arc  the  major  causes  for  slow  BGP 
convergence,  as  observed  in  [17],  [20],  etc.. 
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In  BGP,  when  a  fault  occurs  in  a  network,  certain  coarse-grained  information  about  the  result  of  the  fault, 
such  as  an  UPDATE  message  signalling  a  modified  or  a  withdrawn  route,  is  propagated  so  that  the  network 
eventually  converges  to  a  stable  state.  However,  the  coarse-grained  information  does  not  tell  what  exactly 
the  fault  is  or  where  the  resulting  route  changes  first  occurred.  Therefore,  when  a  node  receives  the  coarse¬ 
grained  information,  the  node  may  still  adopt  a  route  invalidated  by  the  fault,  in  which  case  unnecessary 
route  changes  (i.e.,  instability)  is  incurred.  The  instability  incurred  at  a  node  can  propagate  to  others  and 
delay  the  convergence  of  BGP.  Even  worse,  instability  can  activate  route-flap  damping,  which  suppresses 
routes  going  through  unstable  nodes,  leading  to  a  loss  of  reachability  as  well  as  a  delay  in  BGP  convergence 
(potentially  for  hours)  [20]. 

To  give  an  example,  let  us  consider  a  network  state  q  where  the  network  topology  and  the  routing  tree 
rooted  at  d  arc  shown  in  Figures  1(a)  and  1(b)  respectively;  for  the  three  backup  routes  of  g  at  state  q,  route 
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(a)  Network  topology 


(b)  Routing  tree 


Figure  1:  An  example  network  state.  For  simplicity,  each  node  in  the  figure  also  represents  its  AS. 


[/,  b ,  a,  d]  ranks  the  highest,  followed  by  [j,  h,  a,  d],  and  [c,  w,  d]  ranks  the  lowest.  If  a  fail-stops  at  state  q, 
nodes  to,  /,  and  j  will  withdraw  their  routes  [to.  6,  a,  d],  [/,  b,  a,  d\,  and  [j,  h  .  a,  d]  respectively.  However, 
the  resulting  route-withdrawal  UPDATE  messages  do  not  signal  the  fact  that  a  and  its  associated  links  have 
fail-stopped.  Therefore,  if  /  has  not  withdrawn  [/,  b ,  a,  d]  when  to  withdraws  [to,  b,  a ,  d]  (due  to  different 
delays  along  the  two  routes),  g  will  adopt  [/,  6,  a,  d]  as  its  route  even  though  [/,  b ,  a,  d]  has  been  invalidated 
by  the  fail-stop  of  a.  Similarly,  if  j  has  not  withdrawn  [j,  h,  a,  d]  when  /  withdraws  [/,  b ,  a,  d]  later,  g 
will  adopt  [ j ,  h,  a,  d]  even  though  it  has  also  been  invalidated  by  the  fail-stop  of  a.  g  will  not  change  to 
its  final  stable  route  [c,w,d\  until  j  withdraws  [j.  h.  a,  d].  Therefore,  g  changes  route  three  times  during 
convergence,  with  the  first  two  changes  being  unnecessary.  Even  worse,  the  unnecessary  route  changes  at  g 
can  propagate  and  cause  unnecessary  route  changes  at  other  nodes,  such  as  to,  L  /,  and  b,  when  g  announces 
[g,  f,  b ,  a,  d]  or  [g,  j,  h,  a,  d]  to  its  export  neighbors. 

Distribution-inherent  instability.  Distribution-inherent  instability  is  the  type  of  instability  that  is  incurred 
(i)  at  a  node  which  adopts  an  invalid  route  because  no  information  regarding  the  fault  that  invalidates  the 
route  has  reached  the  node,  or  (ii)  at  a  node  which  adopts  a  valid  route  that  becomes  either  invalid  or  lower- 
ranked  than  some  other  route  later. 

To  give  an  example  of  type-(i)  distribution-inherent  instability,  let  us  consider  again  the  network  state  q 
as  shown  in  Figure  1.  If  node  b  and  link  (a,  h)  fail-stop  simultaneously,  and  if  to  as  well  as  /  withdraws  its 
route  earlier  than  j  does,  then  no  information  that  is  generated  due  to  the  fail-stop  of  (a,  h)  will  have  reached 
g  when  it  receives  the  route-withdrawal  messages  from  m  and  /.  Therefore,  g  will  choose  [j,  h,  a ,  d]  as  its 
new  route,  which  will  be  withdrawn  later.  Thus,  an  unnecessary  route  change  is  incurred  at  g  before  it 
chooses  its  final  stable  route  [c,  w,  d]. 

To  give  an  example  of  type-(ii)  distribution-inherent  instability,  let  us  consider  a  network  state  where 
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the  network  topology  is  the  same  as  that  in  Figure  1(a)  but  no  node  has  learned  any  route  to  d.  Then  d  will 
announce  its  existence,  and  other  nodes  such  as  a,  b,  m,  and  /  will  learn  their  routes  gradually.  If  /  exports 
its  route  to  g  earlier  than  m  does,  g  will  choose  [/,  b,  a,  d\  as  its  route  first.  When  m  exports  [ m ,  b ,  a,  d ]  to  g 
later,  g  will  change  its  route  to  [m,  b ,  a,  d],  since  [m,  b,  a,  d]  ranks  higher  than  [/,  b ,  a,  d]  does.  Thus,  there  is 
an  unnecessary  route  change  at  g,  even  though  the  route  [/,  b,  a,  d]  adopted  by  g  is  valid. 

Unlike  fault-agnostic  instability,  distribution-inherent  instability  does  not  cause  long  delay  in  BGP  con¬ 
vergence  (as  observed  in  Section  6  and  in  [20]).  Moreover,  distribution-inherent  instability  exists  in  every 
distributed  routing  protocol,  as  proved  in 

Theorem  1  (Impossibility  of  eliminating  all  distribution-inherent  instability)  In  a  network,  if  message 
passing  delay  along  links  is  greater  than  zero,  route  ranking  policies  are  not  shared  among  ASes,  and  faults 
are  independent  of  one  another,  then  it  is  impossible  to  eliminate  all  distribution-inherent  instability  in  any 
stateful  distributed  routing  protocol. 

Proof :  In  inter-AS  routing,  besides  network  topology  and  export  as  well  as  import  policies,  route  ranking 
policies  adopted  at  ASes  determine  the  route  chosen  by  an  AS.  When  route  ranking  policies  are  not  shared 
among  ASes  (which  is  the  common  practice  in  Internet),  a  node  j  cannot  predict  the  route  taken  by  other 
nodes  even  if  j  can  learn  the  whole  network  topology,  the  export  and  import  policies  of  other  ASes.  There¬ 
fore,  a  node  can  only  choose  and  set  up  its  route  based  on  the  routes  adopted  by  its  import  neighbors.  This 
fact,  together  with  unpredictability  of  faults  and  greater-than-zero  link  delay,  results  in  the  impossibility  of 
completely  avoiding  distribution-inherent  instability  as  explained  below. 

When  faults  occur  in  a  network,  the  routes  of  those  nodes  where  the  faults  have  occurred  may  change. 
Then  the  export  neighbors  of  these  nodes  may  change  accordingly.  A  node  k  may  choose  as  its  route  a  route 
r  exported  from  one  of  its  import  neighbors  k'  before  k  learns  the  best  route  r'  available  to  k,  and  r  is  a 
temporary  route  for  k  even  though  r  may  be  the  final  stable  route  (i.e.,  the  best  route  available)  of  Id.  This 
is  due  to  the  following  two  reasons: 

a)  The  best  route  r'  available  to  k  in  a  given  network  topology  and  certain  routing  policies  may  reach  k 
after  a  less  preferred  route  r  available  to  k  has  reached  /;:,  because  message  passing  delay  along  different 
routes  are  different  as  well  as  non-zero; 

b)  k  cannot  predict  for  sure  the  existence  of  r'  or  the  delay  between  the  receipt  of  r  and  r',  because  routing 
policies  are  not  shared  among  ASes. 

Therefore,  there  arc  two  alternatives  k  can  adopt: 

•  k  always  chooses  the  best  route  it  has  learned  of  so  far:  in  this  case,  it  is  trivially  true  that  k  may 
choose  r  before  it  chooses  r'. 

•  k  waits  for  some  time  t/-  before  choosing  the  best  route  it  has  learned  after  k  has  learned  some 
change(s)  in  network  state:  in  this  case,  we  can  always  construct  an  instance  of  the  problem  where 
there  exists  some  k  that  chooses  a  less  preferred  route  r  before  k  learns  the  best  route  r'  available 
to  it. 

Consider  a  network  where  the  topology  is  a  complete  graph  and  the  number  of  nodes  is  greater  than 
4,  then  for  any  set  of  values  tyt  for  every  node  k"  in  the  network,  we  can  always  find  four  values 
tko,  Ik',  tk,  and  tfc i  such  that  tko  +  tk'  >  tk  >  ffci-  If  the  routing  policy  at  node  k  is  such  that  k 
prefers  route  that  goes  through  /::()  and  Id  to  route  that  goes  through  kl,  then  k  chooses  the  route 
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that  goes  through  /,■  1  (i.e.,  the  route  r)  before  it  learns  the  more -preferred  route  that  goes  through 
kO  and  k!  (i.e.,  the  route  r'). 

Therefore,  a  node  k  may  choose  and  advertise  to  its  export  neighbors  some  temporary  routes  before  it 
reaches  its  final  stable  state. 

Thus,  generally  speaking,  for  a  node  i,  when  it  has  received  some  routes  exported  from  its  import 
neighbors  after  the  occurrence  of  faults,  i  may  choose  as  its  route  to  the  destination  the  route  r  from  one 
import  neighbor  j  such  that  r  is  a  transient  route  of  j}  Therefore,  i  has  to  change  its  route  after  j  changes 
its  route  from  r  to  its  final  stable  route  r',  and  instability  (i.e.,  extra  route-change)  is  incurred  at  i.  This 
instability  is  the  kind  of  distribution-inherent  instability  where  the  adopted  route  at  a  node  is  valid  but  is 
transient.  This  kind  of  distribution-inherent  instability  is  partly  inherent  with  the  fact  that  the  routing  policies 
arc  not  shared  among  ASes,  and  is  impossible  to  completely  avoid  in  any  distributed  routing  protocols. 

In  distributed  routing  protocols,  the  propagation  of  the  kind  of  instability  where  the  adopted  route  at  a 
node  is  valid  but  transient  can  result  in  the  other  kind  of  distribution-inherent  instability  where  the  adopted 
route  is  invalid  because  no  information  about  the  invalidity  of  the  route  could  have  reached  the  node  when 
it  decides  to  adopt  the  route.  (The  reasoning  is  the  same  as  that  for  proving  that  there  exists  k  that  chooses 
a  route  r  before  learning  its  final  stable  route  r.  For  simplicity,  we  omit  it  here.)  Following  the  example 
discussed  in  the  last  paragraph,  after  j  changes  its  route  from  r  to  r' ,  we  consider  another  node  k  that 
chooses  as  its  route  a  route  r"  that  includes  r  when  no  (implicit  or  explicit)  information  regarding  the  route 
change  from  r  to  r'  at  j  has  reached  k  due  to  non-zero  link  delay.  Then  k  will  change  its  route  at  least  once 
more  later  on,  since  j  is  not  using  route  r'  anymore. 

Moreover,  if  multiple  faults  occur  in  a  network  simultaneously,  there  can  also  exist  the  kind  of  distribution- 
inherent  instability  where  the  adopted  route  is  invalid  because  no  information  about  the  invalidity  of  the  route 
could  have  reached  the  node  when  it  decides  to  adopt  the  route.  Following  the  same  reasoning  for  proving 
the  existence  of  k  that  chooses  a  route  r  before  learning  its  final  stable  route  r,  we  can  always  find  two  inde¬ 
pendent  faults  FI  and  F2  that  occur  to  two  different  nodes  and  another  node  k  such  that  some  information 
about  FI  reaches  k  earlier  than  information  about  F 2  does,  which  makes  k  choose  a  route  r"  that  has  been 
invalidated  by  F 2  before  information  about  F 2  could  reach  k  because  of  non-zero  link  delay.  Then  k  will 
change  its  route  at  least  once  more  later  on  since  r"  is  an  invalid  route. 


□ 

Therefore,  we  focus  on  the  mechanisms  as  well  as  the  impact  of  eliminating  fault-agnostic  instability; 
we  only  briefly  discuss  approaches  to  reducing  distribution-inherent  instability  in  Section  7. 

3.2  G-BGP  design 

To  eliminate  fault-agnostic  instability,  we  develop  protocol  G-BGP  that  refines  BGP  with  the  following 
mechanisms: 

Propagating  information  about  faults.  In  BGP,  fault-agnostic  instability  is  incurred  at  a  node  which 
adopts  an  invalid  route  due  to  the  lack  of  fine-grained  information  about  faults.  Therefore,  in  G-BGP, 


'This  is  because  message  passing  delay  along  links  is  greater  than  zero,  routing  policies  is  not  shared  among  ASes,  and  inde¬ 
pendence  of  faults  that  happen  at  different  time  or  to  different  ASes.  Detailed  proof  is  the  same  as  that  for  proving  there  exists  k 
that  chooses  a  route  r  before  learning  its  final  stable  route  r.  For  simplicity,  we  omit  it  here. 
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necessary  fine-grained  fault  information  is  propagated  when  a  fault  occurs;  when  the  affected  nodes  receive 
the  fault  information,  they  arc  able  to  learn  the  fault  or  its  impact  and  avoid  using  any  route  that  is  invalidated 
by  the  fault.  On  the  other  hand,  fault  information  is  propagated  only  if  the  corresponding  fault  invalidates 
the  existing  route  of  some  node;  and  fault  information  is  either  not  stored  or  only  temporarily  stored  at  a 
node  for  a  bounded  time. 

When  a  fault  occurs,  the  information  being  propagated  depends  on  the  type  of  the  fault.  In  general,  as 
a  result  of  the  fault,  one  or  more  nodes  in  the  network  may  change  their  next-hops  in  forwarding  traffic,  in 
which  case  necessary  points  of  channel-withdrawal  or  points  of  segment-withdrawal  arc  propagated  to  the 
affected  nodes  to  reflect  the  fact  that  certain  channels  or  route  segments  arc  not  used  in  forwarding  traffic 
any  more.  In  the  case  when  all  the  nodes  in  an  AS  fail-stop,  when  the  channel  between  two  ASes  fail-stops, 
or  when  a  node  joins  the  network,  a  point  of  AS-failure,  a  point  of  channel-failure,  or  a  point  of  node-join  is 
also  propagated  respectively. 

Rejecting  obsolete  fault  information.  In  a  network,  message  passing  and  processing  delay  along  different 
routes  may  differ,  thus  a  fresher  message  containing  some  fault  information  regarding  an  AS  may  reach  a 
node  earlier  than  a  staler  message  containing  some  obsolete  fault  information  regarding  the  AS.  This  can 
lead  to  fault-agnostic  instability,  if  the  obsolete  fault  information  is  used.  To  avoid  using  obsolete  fault 
information,  G-BGP  verifies  the  freshness  of  each  piece  of  fault  information  upon  receipt,  which  is  enabled 
by  enforcing  a  total  order  on  all  the  fault  information  regarding  the  same  AS  that  is  sent  out  from  the  AS  at 
different  time. 

Localized  uncertainty  resolution.  When  a  node  i  detects  that  a  link  (i,  j )  has  fail-stopped  with  the  existing 
fault  detection  mechanisms  in  BGP  (e.g.,  neighboring  nodes  periodically  exchange  Keep-Alive  messages),  i 
cannot  ascertain  whether  its  neighbor  j  and  the  neighboring  AS  j.AS  arc  up  or  down.  This  uncertainty,  if  left 
unresolved,  can  lead  to  fault-agnostic  instability.  For  example,  at  the  network  state  q  as  shown  in  Figure  1 , 
if  a  fail-stops,  b  can  only  ascertain  that  link  (6,  a)  has  fail-stopped,  but  b  cannot  ascertain  whether  a  is 
down.  Therefore,  only  the  point  of  channel-failure  information  signaling  the  fail-stop  of  (6,  a)  is  propagated 
to  g\  thus  g  only  knows  that  (6,  a)  has  fail-stopped,  but  g  is  uncertain  whether  a  is  still  up  and  whether 
\j,  h,  a ,  ri]  is  valid.  In  BGP,  g  adopts  [j,  h,  a ,  d]  by  “assuming  without  proof’  that  it  is  valid,  which  results  in 
fault-agnostic  instability. 

Similarly,  when  a  node  i  receives  a  point  of  segment-withdrawal  or  a  point  of  node -join  information 
which  signals  that  nodes  in  an  AS  J  other  than  i.AS  may  have  changed  routes,  fault-agnostic  instability 
can  occur,  if  i  adopts  a  route  going  through  J  by  simply  assuming  (without  proof)  that  nodes  in  J  have  not 
changed  routes. 

To  avoid  fault-agnostic  instability  caused  by  the  uncertainty  regarding  the  state  of  an  AS  or  a  route, 
G-BGP  resolves  the  uncertainty  by  gathering  proof  of  the  state  of  the  suspected  AS  or  route.  To  expedite 
potential  uncertainty  resolution  operation,  G-BGP  uses  the  mechanisms  of  “quickly  marking  suspectable 
invalid  routes”  and  “collaboratively  clarifying  state”.  Moreover,  uncertainty  resolution  in  G-BGP  is  local 
in  the  sense  that,  usually,  only  nodes  close  to  the  suspected  AS  need  to  resolve  the  uncertainty,  but  nodes 
farther  away  do  not,  which  is  the  case  especially  in  highly  connected  networks  such  as  the  Internet  [8]. 

We  elaborate  on  the  above  mechanisms  in  the  following  subsections. 


3.2.1  Propagating  fault  information  in  a  bounded  manner 

Towards  enabling  nodes  to  generate  appropriate  fault  information  in  the  presence  of  faults,  the  intra-AS 
coordination  in  BGP  is  enhanced  as  follows:  each  node  i  informs  the  other  nodes  in  its  AS  of  the  route 
of  i  itself,  the  neighboring  ASes  to  which  i  has  exported  its  route,  and  the  neighboring  ASes  to  which  i 
is  connected  via  an  up-link.  By  the  enhanced  intra-AS  coordination,  every  node  j  can  decide  (i)  whether 
there  is  another  node  in  j.AS  whose  route  goes  through  the  same  neighboring  AS  as  j,  (ii)  whether  there  is 
another  node  in  j.AS  that  has  exported  route  to  some  neighboring  AS  which  j  has  exported  route  to,  and 
(iii)  whether  the  channel  to  a  neighboring  AS  is  up. 

Then,  necessary  fault  information  is  generated  as  follows  in  the  presence  of  faults. 

Point  of  channel- withdrawal.  When  a  node  i  changes  from  a  route  R  =  [J, . . . ,  d.AS]  to  another  non¬ 
empty  one  \J' ,  . . . ,  d.AS]  with  J'  J,  i  will  not  use  any  link  between  ASes  i.AS  and  J  in  forwarding 
traffic  to  d.  In  this  case,  if  R  is  still  valid2,  but  no  link  between  J  and  i.AS  is  used  by  any  node  in  i.AS,  i 
will  generate  a  point  of  channel-withdrawal  {[i.AS'.  J]),  unless  i  has  received  it  from  some  other  node,  to 
signal  the  fact  that  every  route  going  through  route  segment  [i.AS,  J]  has  become  invalid.  For  convenience, 
we  call  (i.AS,  J)  a  withdrawn-channel. 

Special  cases  arc  when  an  AS  changes  its  import  or  export  policy.  When  an  AS  1  changes  its  import 
policy  such  that  nodes  in  T  should  not  import  routes  from  a  set  §  of  neighboring  ASes,  a  node  i  in  X  should 
generate  the  set  of  points  of  channel-withdrawal  {([X, /C])  :  /C  G  S},  unless  i  has  received  it  from  some 
other  nodes.  Similarly,  when  X  changes  its  export  policy  such  that  nodes  in  X  should  not  export  routes  to  a 
set  S'  of  neighboring  ASes,  a  node  i  in  X  should  generate  and  send  the  set  of  points  of  channel-withdrawal 
{([/C',X])  :  Kf  G  S'}  to  its  external  neighbors,  if  any,  to  which  i  has  exported  its  route. 

Point  of  segment-withdrawal.  If  some  node  in  i.AS  is  still  using  a  link  between  J  and  i.AS  when  a  node 
i  changes  from  a  valid  route  It  =  [J , . . . ,  d.AS]  to  another  non-empty  one  [J' , . . . ,  d.AS ]  with  J'  f  J ,  i 
should  not  generate  the  point  of  channel-withdrawal  ([/AS1,  J])\  otherwise,  valid  routes  can  be  mistakenly 
regarded  as  invalid.  In  this  case,  i  calculates  the  set  S  of  its  neighboring  ASes  such  that,  for  every  1C  G  S,  i 
has  exported  route  R  to  /C,  but  there  is  no  node  in  i.AS  that  has  exported  its  route  to  fC  and  is  still  using  any 
link  between  J  and  i.AS.  i  also  calculates  the  set  S'  of  its  neighboring  ASes  such  that,  for  every  K!  G  S', 
i  has  exported  route  I!  to  Kf ,  and  there  is  at  least  one  node  in  i.AS  that  has  exported  its  route  to  1C'  and  is 
still  using  a  link  between  J  and  i.AS 

Then,  for  every  AS  1C.  G  S,  route  segment  [1C,  i.AS,  J]  will  not  be  used  by  any  node  in  1C  after  i 
exports  its  new  route  to  1C,  thus  every  route  going  through  1C,,  i.AS.  J  becomes  invalid;  However,  for 
an  AS  K!  G  S',  some  node  in  Kf  may  still  use  and  some  other  node  in  K!  may  stop  using  route  segment 
\fC' ,  i.AS ,  J]  after  i  exports  its  new  route  to  K.',  thus  i  is  uncertain  about  the  validity  of  routes  going  through 
\K! ,  i.AS ,  J],  To  signal  the  above  fact  when  S  f  0  and/or  S'  f  0  ,  i  generates  a  point  of  segment-withdrawal 
(S,  S',  i.AS,  J ,  i,  t ),  where  t  is  the  time  passed  since  i  changes  its  route  (t  is  0  initially  and  increases  as  the 
point  of  segment-withdrawal  is  propagated  from  one  node  to  another).  If  S'  f  0,  the  uncertainty  regarding 
the  validity  of  routes  that  go  through  jC' ,  i.AS,  J]  for  some  Kf  G  S'  is  resolved,  if  need  be,  later  at  nodes 
close  to  K!  (to  be  discussed  in  Section  3.2.3).  For  convenience,  we  call  [1C,  i.AS,  J]  a  withdrawn-segment 
for  every  1C  G  S;  for  every  1C1  G  S',  we  call  [1C' ,  i.AS,  J]  a  suspected-segment  and  Kf  a  suspected  AS. 


2In  the  case  when  R  has  become  invalid,  but  the  fault  information  at  i  does  not  invalidate  R,  i  also  regards  R  as  “valid”.  This 
can  happen  when  G-BGP  is  only  partially  deployed  and  the  fault  happens  to  a  network  region  where  G-BGP  is  not  deployed. 
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Point  of  AS-failure.  When  all  the  nodes  in  an  AS  X  fail-stop,  every  route  going  through  X  becomes  invalid. 
To  signal  this  fact,  a  point  of  AS-failure  (X)  is  generated  by  every  node  j,  if  any,  that  detects  the  fail-stop  of 
X  and  whose  current  route  goes  through  X. 

A  special  case  is  when  the  destination  d  withdraws  its  address  prefix.  In  this  case,  nodes  in  d  generate 
the  point  of  AS-failure  (d),  since  the  effect  of  d  withdrawing  its  address  prefix  is  the  same  as  that  of  d 
fail-stopping. 

Point  of  channel-failure.  When  a  node  i  with  route  J . . . . ,  d.AS]  detects  that  its  link  to  AS  J  has  fail- 
stopped,  i  will  check  if  the  channel  between  i.AS  and  J  is  up.  If  the  channel  is  down,  i  knows  that  every 
route  going  through  route  segment  [i.AS,  J]  becomes  invalid.  However,  i  is  uncertain  whether  its  external 
neighbor(s)  in  J  is(arc)  up  or  down;  thus  i  is  uncertain  whether  the  other  channels  associated  with  J  arc 
up  or  down.  To  signal  the  above  fact,  i  generates  a  point  of  channel-failure  ([i.AS,  J),  t),  where  t  is  the 
time  passed  since  the  channel-failure  is  detected.  The  uncertainty  regarding  the  state  of  J  and  its  associated 
channels  is  resolved,  if  need  be,  later  at  nodes  close  to  J.  For  convenience,  we  call  J  a  suspected  AS. 

Point  of  node-join.  When  a  node  i  joins  a  network  and  exports  its  route  to  a  set  §  of  neighboring  ASes, 
nodes  in  those  ASes  may  change  their  routes  to  those  going  through  i.AS,  which  is,  however,  uncertain  to 
i.  To  signal  the  above  fact,  i  generates  a  point  of  node-join  (i.AS,  S,  i,  t),  where  t  is  the  time  passed  since 
i  joins  the  network.  The  uncertainty  regarding  whether  nodes  in  an  AS  /C  in  S  have  changed  their  routes  to 
those  going  through  i.AS  is  resolved,  if  need  be,  later  at  nodes  close  to  /C.  For  convenience,  we  call  every 
/C  in  §  a  suspected  AS. 

How  G-BGP  uses  and  propagates  fault  information  in  a  bounded  manner.  As  discussed  above,  when 
a  fault  occurs,  some  node  close  to  where  the  fault  has  occurred  will  generate,  if  need  be,  the  corresponding 
fault  information.  The  newly  generated  fault  information,  if  any,  is  piggybacked  onto  the  UPDATE  messages 
that  the  node  sends  to  its  export  neighbors. 

When  a  node  i  receives  an  UPDATE  message  piggybacked  with  some  fresh  fault  information,  i  first 
modifies  the  information,  if  need  be,  as  follows: 

•  i  changes  every  point  of  segment-withdrawal  (S,  S'.  X,  J .  if  t)  where  i.AS  G  S'  and  every  point  of 
node -join  (1C,  S",  k! ,  t)  where  i.AS  G  S",  if  any,  by  removing  i.AS  from  S'  and  S"  respectively,  since 
i  is  sure  about  the  state  of  its  own  AS  (i.e.,  i.AS). 

•  i  removes  every  point  of  channel-withdrawal  ([J ,i.AS]),  every  point  of  segment-withdrawal  (S,S', 
J,  i.AS,j',  t ),  the  point  of  AS-failure  (i.AS),  every  point  of  channel-failure  ([J ,  /'.AS'],  t),  and  every 
point  of  node -join  (i.AS,  S,  i! ,  t),  if  any,  since  i  will  not  choose  any  route  that  goes  through  its  own 
AS  (i.e.,  i.AS). 

i  also  removes  every  point  of  segment-withdrawal  (S,  S',  i.AS ,  J ,  i,  t),  if  any,  that  is  generated  by  i 
itself. 

Then,  i  invalidates  and  avoids  using  the  routes  that  go  through  any  withdrawn  channel,  withdrawn  seg¬ 
ment,  fail-stopped  AS,  and/or  fail-stopped  channel.  Moreover,  if  the  highest  ranked  candidate  route  R  goes 
through  some  suspected  AS,  i  will  not  choose  R  unless  i  does  not  invalidate  R  after  i  resolves  the  associ¬ 
ated  uncertainty.  If  i  changes  route  after  processing  the  UPDATE  message,  i  sends  to  its  export  neighbors 
an  UPDATE  message  piggybacked  with  the  fault  information  that  i  knows  of,  then  i  deletes  the  fault  in¬ 
formation  without  storing  it.  On  the  other  hand,  if  i  does  not  change  route,  i  will  not  propagate  any  fault 
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information  to  its  external  neighbors;  instead,  i  only  propagates  its  newly-learned  fault  information  to  its 
internal  neighbors,  to  guarantee  that  nodes  in  the  same  AS  have  a  consistent  view  of  faults  and  their  impact. 
Therefore,  information  about  faults  only  propagates  to  the  nodes,  as  well  as  their  immediate  neighbors,  that 
change  routes  due  to  the  faults;  consequently,  the  propagation  of  fault  information  is  bounded. 

When  a  node  j  does  not  change  route  after  receiving  some  fault  information,  j  will  store  the  fault 
information  temporarily  for  up  to  U  time,3  where  U  is  the  upper  bound  on  the  convergence  time  of  BGP 
after  a  fault  occurs.  If  j  does  not  change  route  within  U  time  after  it  receives  the  fault  information,  j  will 
delete  it  permanently,  since  any  route  that  can  be  invalidated  by  the  fault  information  should  have  been 
invalidated  U  time  after  j  receives  the  fault  information.  (Of  course,  if  j  changes  route  within  U  time  after 
receiving  the  fault  information,  the  information  will  be  deleted  after  being  piggybacked  onto  the  UPDATE 
messages  that  j  sends  out.) 

When  a  node  j  stores  some  fault  information,  j  will  update  the  information,  if  need  be,  as  the  network 
state  changes: 

•  When  the  state  of  its  AS  j.  AS  changes  such  that  a  node  i  in  j.AS  uses  a  route  going  through  segment 
[j.AS,  1C]  for  some  /C  and  i  has  exported  the  route  to  a  set  §"  of  neighboring  ASes,  j  knows  that 
channel  (j.AS,  1C)  is  up  and  used.  Thus,  j  deletes  the  point  of  channel-withdrawal  ([j.AS,  IC\),  the 
point  of  segment-withdrawal  (§>,§',  j.AS,  1C,  i,t),  and/or  the  point  of  channel-failure  ([j.AS,JC],t), 
if  they  are  stored  at  j;  moreover,  j  changes  every  point  of  segment-withdrawal  (§,  S',  j.AS,  1C,  i' ,t), 
if  any,  where  i'  /  i  to  (S  \  S",  S',  j.AS,  JC,i' ,t),  and  if  S  \  S"  =  S'  =  0  after  the  change,  j  deletes  the 
corresponding  information. 

•  When  j  receives  an  UPDATE  message  m  that  contains  a  route  R  and  some  fresh  fault  information 
regarding  an  AS  1C, 

-if  R  goes  through  segment  [1C,  1C']  for  some  1C',  j  deletes  the  point  of  channel- withdrawal 
([1C,  1C']),  the  point  of  channel-failure  ([1C,  1C'},  t),  and/or  the  point  of  AS-failure  (1C),  if  they 
are  stored  at  j ; 

-  if  R  goes  through  segment  1C'' ,  1C,  1C']  for  some  1C”  and  AT,  j  changes  every  point  of  segment- 
withdrawal  (S,  S',  1C,  1C' ,  k,  t)  where  1C”  £  S,  if  any,  by  removing  1C”  from  S,  and  if  S  =  S'  =  0 
after  the  change,  j  deletes  the  corresponding  information. 

Then,  j  reliably  informs  other  nodes  in  its  AS  of  the  changes. 

•  For  every  set  of  points  of  segment-withdrawal  {(Sfc,  S'fc,T,  C,  i' ,  tk)  '■  k  £  l..n,n  >  1},  if  any,  that 
are  stored  at  j,  j  integrates  them  into  a  single  point  of  segment-withdrawal  (n)'=|S/.,  n^=1S'k,T,  C, 
i1,  miri/.G  in  ffc);  similarly,  j  integrates  every  set  of  points  of  node -join  {(1C,  S" ,  k',  U)  :  k  £  l..n,  n  > 
1},  if  any,  into  a  single  point  of  node -join  (1C,  nj!=1S^',  k' ,  iriinfeel  n  /fc). 

If  j  has  a  point  of  channel-withdrawal  ([C,X])  and  a  point  of  channel-failure  ({C.  X],  t)  simultane¬ 
ously  (which  can  happen  as  a  result  of  the  uncertainty  resolution  regarding  the  state  of  I),  j  deletes 
(\C,X\,t),  so  that  the  validity  of  routes  going  through  'I  will  not  be  suspected  due  to  the  existence  of 

([C,l\,t). 


3  Alternatively,  we  can  assign  each  piece  of  fault  information  a  lifetime  of  U  and  decrease  its  lifetime  as  time  passes  by.  Then  a 
node  stores  fault  information  with  a  lifetime  of  t'  for  at  most  ( U  —  t')  time. 
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3.2.2  Rejecting  obsolete  fault  information 

To  avoid  using  obsolete  fault  information,  G-BGP  enforces  a  total  order  on  all  the  fault  information  regarding 
the  same  AS  that  is  sent  out  from  the  AS  at  different  time.  This  is  achieved  by  assigning  sequence  numbers  to 
fault  information  such  that  fresher  fault  information  regarding  an  AS  has  a  larger  sequence  number  than  does 
staler  fault  information  regarding  the  same  AS.  ( Unless  specified  other-wise,  all  the  arithmetic  operations, 
including  comparisons,  in  this  section  are  based  on  “modulo  some  big  number  M”.) 

Numbering  fault  information.  To  enable  the  sequence-number  based  information-freshness-checking  , 
nodes  within  an  AS  X  coordinate  with  each  other  to  maintain  a  monotonically-increasing  sequence  number 
l.sn  for  X.  For  every  neighboring  AS  J ,  nodes  in  X  also  maintain  a  local  copy  of  J' s  sequence  number, 
denoted  by  l.J .sn.  We  assume  that  the  synchronization  delay  between  J .sn  and  l.J .sn  (i.e.,  J .sn  — 
l.J  .sn)  is  bounded  from  above  by  Dsn.  To  guarantee  monotonicity  in  the  sequence  number  of  an  AS,  a 
node  stores  the  sequence  number  of  its  AS  in  a  persistent  memory;  when  a  fail-stopped  node  i  joins  the 
network,  i  either  gets  the  sequence  number  of  its  AS  from  some  other  up-node  in  the  AS,  or,  if  there  is  no 
up-node  other  than  i  in  the  AS,  it  gets  the  sequence  number  from  its  persistent  memory  and  increases  it  by 

Dsn  +  1. 

When  piggybacking  fault  information  onto  UPDATE  messages  that  arc  sent  to  external  neighbors,  a 
node  i  attaches  proper  sequence  number  to  each  piece  of  fault  information  that  is  generated  by  i  itself  or 
some  other  node  in  i.AS: 

•  For  each  piece  of  fault  information  regarding  the  state  of  i.AS  (i.e.,  a  point  of  channel-withdrawal 

([i.AS,  J’]),  a  point  of  segment-withdrawal  (S,  S',  i.AS,  J  ,i' ,  t),  the  point  of  AS-failure  (i.AS),  a 
point  of  channel-failure  ([i.AS,  or  a  point  of  node -join  (i.AS,  S,  i! ,  t)),  i  simply  attaches  the 

sequence  number  (i.AS).sn.  Then,  i  coordinate  with  other  nodes  in  its  AS  to  increase  (i.AS).sn  by 
1. 

•  For  every  point  of  channel-withdrawal  ([1C,  i.AS']),  if  any,  that  is  generated  when  i.AS  changes  its 
export  policy  such  that  nodes  in  it  do  not  export  routes  to  an  AS  JC,  i  attaches  the  sequence  number 
((i.AS).lC.sn  +  Dsn)  instead  of  (i.AS).sn,  since  ([1C,  i.AS])  is  about  the  fact  that  nodes  in  1C  will 
not  use  any  link  between  1C  and  i.AS  in  forwarding  traffic. 

When  nodes  in  1C  receive  ([1C,  i.AS]),  they  coordinate  with  one  another  to  increase  IC.sn  by  Dsn  +  1. 

•  For  every  point  of  AS-failure  (J),  if  any,  that  is  generated  for  a  fail-stopped  neighboring  AS  J,  i 
attaches  the  sequence  number  ((i.AS).J.sn  +  Dsn )  instead  of  (i.AS).sn,  since  (J)  is  about  the 
fail-stop  of  J . 

For  fault  information  that  is  generated  by  nodes  outside  i.AS,  i  simply  piggybacks  the  information  onto 
UPDATE  messages  without  changing  the  sequence  number  of  the  information. 

How  G-BGP  rejects  obsolete  fault  information.  Towards  enabling  nodes  in  an  AS  X  to  determine  the 
freshness  of  fault  information  regarding  another  AS  1C,  every  time  a  node  i  in  X  receives  some  fresh  fault 
information  regarding  1C,  i  reliably  notifies  the  other  nodes  in  X  of  the  information,  and  all  the  nodes  in  X 
maintain  the  sequence  number  of  the  information  as  l.IC.snM  for  up  to  h  time,  where  X,/  is  the  maximum 
difference  in  delay  in  propagating  UPDATE  messages  along  different  routes  from  one  AS  to  another.  If  no 
node  in  X  receives  any  fresher  fault  information  regarding  1C  within  T,i  time  after  l.IC.snM  was  modified 
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the  last  time,  nodes  in  X  delete  X.IC.snM,  which  we  regal'd  as  “resetting  l.fC.snM  to  — oo”.  At  an  AS  X, 
I.IC.snM  is  initially  —  oo  for  every  other  AS  /C. 

When  a  node  i  receives  an  UPDATE  message  m  containing  a  route  R  and  some  fault  information,  i 
checks  the  freshness  of  each  piece  of  fault  information  and  the  validity  of  R  as  follows: 

•  For  a  piece  of  fault  information  regarding  an  AS  1C,  if  the  sequence  number  of  the  fault  information 
is  less  than  (i.AS).JC.snM  and  the  fault  information  signals  a  “withdrawn”-channel,  a  “withdrawn”- 
segment,  a  “fail-stopped”  AS,  or  a  “fail-stopped”  channel  that  is,  however,  in  a  candidate  route  of  i, 
then  the  fault  information  must  be  obsolete;  otherwise,  the  fault  information  is  fresh,  in  which  case  i 
updates  (i.AS).IC.snM  with  the  sequence  number  of  the  fault  information.  (Note  that  rn  may  contain 
obsolete  and  fresh  fault  information  simultaneously.) 

•  If  m  contains  any  obsolete  fault  information,  R  must  be  invalid. 

After  the  checking  above,  i  accepts  all  fresh  fault  information  and  ignores  all  obsolete  fault  information;  i 
also  accepts  the  announced  route  R  if  m  contains  no  obsolete  fault  information. 

3.2.3  Localized  uncertainty  resolution 

Notations: 

hops(X,  X,  R)  '■  the  number  of  inter-AS  hops  between  ASes  X  and  J  in  a  route  R  =  [X, . . . ,  J , . . . ,  d.AS]', 
Tm  :  the  upper  bound  on  the  time  taken  to  process  a  message  in  BGP. 

To  expedite  and  to  enhance  the  locality  of  potential  uncertainty  resolution,  G-BGP  uses  the  mechanisms 
of  “quickly  marking  suspectable  invalid  routes”  and  “collaboratively  clarifying  state”. 

Quickly  marking  suspectable  invalid  routes.  To  resolve  uncertainty,  a  node  needs  to  obtain  proof  in¬ 
formation  from  others.  However,  information  flow  can  be  slow  in  BGP  due  to  the  use  of  MRAI  timer.  To 
expedite  potential  uncertainty  resolution  after  a  point  of  segment-withdrawal  or  a  point  of  node -join  is  gen¬ 
erated,  purging-messages  are  sent,  without  subject  to  the  MRAI  timer  control,  along  invalid  routes  that  go 
through  the  corresponding  suspected  segment  or  AS.  More  specifically: 

•  A  node  i  sends  a  purging-message  to  each  of  its  export  neighbors,  when  either  of  the  following 
conditions  holds:  (i)  i  will  change  its  route  not  to  go  through  a  segment  [X,  /C]  after  i  receives  a  point 
of  segment-withdrawal  (§,  S'.  X,  /C,  j',  t )  where  i.AS  €  S';  (ii)  i  will  change  its  route  to  go  through 
an  AS  X  after  i  receives  a  point  of  node -join  (X,  S,  j',  t)  where  i.AS  €  S. 

•  When  a  node  j  receives  a  purging-message  from  an  import  neighbor  i,  j  marks  as  invalid  the  candidate 
route  imported  from  i;  moreover,  if  the  marked  candidate  route  is  the  current  route  of  j,  j  re-sends  a 
purging-message  to  each  of  its  export  neighbors. 

•  When  a  node  j  marks  its  route  as  invalid,  j  does  not  change  its  route  immediately;  instead,  j  waits  for 
the  normal  BGP  procedure  to  stabilize  its  route  later.  On  the  other  hand,  j  will  remove  the  marking, 
if  its  route  has  been  marked  as  invalid  for  U  time  without  being  withdrawn  or  changed  (i.e.,  the  route 
has  become  valid),  where  U  is  the  upper  bound  on  the  convergence  time  of  BGP  after  a  fault  occurs. 
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Therefore,  if  a  node  i  sends  a  purging-message  to  its  export  neighbors,  another  node  j  that  has  a  candidate 
route  R  =  [1C, ,  i.AS, . . .  ,  d.AS]  will  mark  R  as  invalid  within  ( hops(K, ,  i.AS,  R)  +  1)  x  (Tm  +  Ud) 
time.  (Given  the  high-performance  routers  and  high-speed  networks  in  today’s  Internet,  both  Tm  and  Ud 
tend  to  be  quite  small.) 

Collaboratively  clarifying  state.  When  a  channel  (i.AS.  J)  used  by  a  node  i  becomes  down,  i  generates 
a  point  of  channel-failure  ([i.AS,  J],t)  to  signal,  in  addition  to  the  fail-stop  of  (i.AS,  J ),  the  uncertainty 
regarding  the  validity  of  routes  going  through  some  other  channel  associated  with  J .  In  this  case,  only  the  up 
nodes  in  J ,  if  any,  know  the  exact  state  (i.e.,  up  or  down)  of  the  channels  associated  with  J .  Therefore,  the 
up  nodes  in  J  propagate,  without  subject  to  the  MRAI  timer  control,  state-clarifiers  regarding  the  channels 
that  fail-stop  simultaneously  to  the  nodes  whose  candidate  routes  go  through  an  up  channel  associated  with 
J .  More  specifically, 

•  When  a  node  j  detects  that  a  channel  ( j.AS, I )  used  by  some  node  in  a  neighboring  AS  T  fail-stops, 
j  first  calculates  the  set  8  of  ASes  such  that,  for  every  X'  G  8,  j  detects  the  fail-stop  of  ( j.AS,X ') 
within  T f  time  before  or  after  j  detects  the  fail-stop  of  (j.AS,  X) ,  where  Tf  is  the  delay  in  detecting  the 
fail-stop  of  channels.  (By  definition,  X  £  8.)  Then,  j  sends  to  its  external  neighbors  the  state-clarifier 

(S, j-AS ). 

•  When  a  node  k  receives  a  state-clarifier  (8,  J)  from  another  node  k',  k  stores  (S,  J),  if  the  route  of  k 

goes  through  a  segment  [X1 ,  J]  for  some  X’  £  8;  otherwise,  if  the  route  of  k  is  imported  from  IS  and 
goes  through  a  segment  [1C,  J]  for  some  1C,  §,  k  first  invalidates  all  of  its  candidate  routes,  if  any, 

that  go  through  a  segment  \X" ,  J]  for  some  X"  £  8,  then  k  re-sends  the  state-clarifier  to  its  export 
neighbors; 

•  A  node  deletes  a  stored  state-clarifier,  if  it  has  been  stored  for  U  time  without  being  used,  where  U  is 
the  upper  bound  on  the  convergence  time  of  BGP  after  a  fault  occurs. 

Therefore,  if  a  node  j  sends  a  state-clarifier  (j.AS,  8}  to  its  export  neighbors,  another  node  k  that  has  a  valid 
candidate  route  R  =  [K,, . . .  ,j.AS, . . . ,  d.AS]  will  receive  the  state-clarifier  within  (hops(IC,  j.AS,  R)  + 
1)  x  (Tm  +  Ud)  time. 

How  G-BGP  resolves  uncertainty.  When  the  highest  ranked  candidate  route  R  =  \J. . . . .  KS .  1C,  . . . ,  d.AS] 
of  a  node  i  goes  through  some  suspected  AS  1C,  i  suspects  the  validity  of  R  and  resolves  the  associated  un¬ 
certainty,  if  either  of  the  following  conditions  holds:4 5 

•  Condition  1:  i  has  a  point  of  segment-withdrawal  {§,  S'.  Ik  J' ,  i' ,  t)  where  /C  £  S'  and  [1C,  X' ,  J']  £ 
R,  or  /  has  a  point  of  node -join  (X" ,  i' ,  t )  where  1C  £  §"  and  [1C,  X"]  (f:  R\  but  the  point  of  segment- 
withdrawal  or  the  point  of  node -join  is  not  piggybacked  in  the  UPDATE  message  that  contains  R. 

In  this  case,  i  regards  R  as  invalid  if  R  has  already  been  marked  as  invalid,  or  i  regards  R  as  valid 
if  R  has  not  been  marked  as  invalid  and  t  >  a  x  (hops (ST,  KS,R)  +  1)  x  (Tm  +  Ud)',  otherwise,  i 


4G-BGP  changes  the  BGP  UPDATE-send  method,  so  that  when  a  node  i  uses  the  route  imported  from  one  of  its  import  neighbors 
j,  i  also  sends  its  route  back  to  j.  Therefore,  a  node  j  can  decide  whether  its  route  is  used  by  some  of  its  external  neighbors  in 
a  neighboring  AS.  By  letting  nodes  in  the  same  AS  share  with  each  other  this  information,  a  node  can  decide  whether  a  channel 
between  its  AS  and  a  neighboring  AS  is  used  by  some  node  in  the  neighboring  AS. 

5If  there  are  multiple  suspected  ASes  in  R,  i  resolves  the  uncertainty,  if  need  be,  regarding  these  ASes  in  parallel. 
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judiciously  waits  for  ( a  x  {hops{J ,  JC,R)  +  1)  x  ( Tm  +  Ud )  —  t)  time,  after  which  i  either  regards 
R  as  invalid  if  R  has  been  marked  as  invalid,  or  i  regards  R  as  valid  otherwise. 

If  R  is  regarded  as  valid  after  the  uncertainty  resolution,  no  node  whose  highest  ranked  route  R'  goes 
through  the  new  route  [i.AS.  R]  of  i  will  suspect  the  validity  of  R,  since  the  UPDATE  message  that 
contains  R'  must  also  contain  the  point  of  segment-withdrawal  or  the  point  of  node -join. 

•  Condition  2:  i  has  a  point  of  channel-failure  ([I'.  K],  t)  where  I'  ^  1C'. 

In  this  case,  i  regards  R  as  valid  if  i  has  a  state-clarifier  (§,  1C)  where  K"  (j  8,  or  i  regards  R  as  invalid 
if  i  does  not  have  such  a  state-clarifier  yet  t  >  a  x  (hops(J ,  1C,  R)  +  1)  x  ( Tm  +  Ud)',  otherwise,  i 
judiciously  waits  for  (a  x  (hops(J ,  1C,  R)  +  1)  x  ( Tm  +  Ud)  —  t)  time,  after  which  i  either  regai'ds 
R  as  valid  if  i  has  a  state-clarifier  {§,  1C)  where  1C'  (f  8,  or  i  regai'ds  R  as  invalid  otherwise. 

If  R  is  regarded  as  valid  after  the  uncertainty  resolution,  i  changes  the  state-clarifier  into  a  set  of 
points  of  channel-withdrawal  {([1C",  1C])  :  1C"  6  8};  i  also  deletes  every  point  of  channel-failure 
(\2"  ,fC],t)  where  2"  G  8,  so  that  no  node  whose  route  goes  through  the  route  of  i,  no  matter  be¬ 
fore  or  after  i  changes  its  route,  will  suspect  the  validity  of  routes  going  through  1C.  (Every  newly 
generated  point  of  channel- withdrawal  ([2" ,  1C])  that  corresponds  to  a  deleted  point  of  channel-failure 
([2",K],t)  assumes  the  sequence  number  of  ([2",lC\,t);  the  remaining  newly  generated  points  of 
channel-withdrawal  do  not  assume  any  sequence  numbers  and  are  always  regarded  as  fresh.) 

By  the  above  method  of  uncertainty  resolution,  an  invalid  route  going  through  some  suspected  AS  will 
be  discarded,  and  fault-agnostic  instability  as  well  as  its  propagation  is  avoided.  The  elimination  of  this  type 
of  fault-agnostic  instability  is  essential  for  G-BGP  to  achieve  asymptotically  optimal  convergence  speed  or 
to  asymptotically  improve  BGP  convergence  speed  in  several  common  scenarios  (e.g.,  when  a  node  with 
multiple  neighboring  ASes  fail-stops),  as  proved  in  Section  5. 

For  a  valid  route  R  that  is  suspected  by  a  node  i,  once  i  resolves  the  uncertainty  regarding  the  validity 
of  R,  no  node  whose  highest  ranked  route  goes  through  the  new  route  of  i  will  suspect  the  validity  of  R 
any  more.  Thus,  uncertainty  regarding  a  valid  route  is  resolved  locally  in  the  sense  that  only  nodes  that  are 
relatively  close  to  the  suspected  AS  need  to  resolve  the  uncertainty,  but  nodes  that  are  farther  away  from  the 
suspected  AS  need  not. 

Moreover,  as  observed  in  [17],  the  link  latency  as  well  as  the  processing  delay  for  BGP  messages  is 
usually  significantly  less  than  the  MRAI  timer.  Therefore,  the  propagation  of  UPDATE  messages  and  the 
piggybacked  fault  information  is  much  slower  than  the  propagation  of  purging  messages  and  state-clarifiers. 
Thus,  with  high  probability,  a  node  need  not  wait  to  resolve  uncertainty  regarding  a  valid  route;  and  the 
waiting  would  be  short  even  if  need  be. 

3.3  Protocol  G-BGP 

We  present  G-BGP  in  Figure  2,  where  the  variables  and  protocol  actions  of  each  node  i  are  presented. 
(For  conciseness,  we  skip  the  program  for  “fast  marking  of  suspectable  invalid  routes”  and  “collaborative 
clarification”.) 

Variables.  Each  node  i  maintains  variables  i.poas,  i.pocf,  i.pocw,  i.ponj,  i.posw,  i.AS-path,  i.inval, 
i.sn,  i.SN,  i.ehange,  i.seqChg,  i. suspect,  i.adv,  and  i.pAdv : 
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Program  G-BGP.i 

Parameter  j  :  node-id;  J  :  set  of  AS-ids 

Var  i.pocw  :  set  of  (channel,  int)  [initialized  to  0]; 

i.posw  :  set  of  (AS-id  set,  AS-id  set,  channel,  int,  int,  int)  [initialized  to  0]; 

i.poas  :  set  of  (AS-id,  int)  [initialized  to  0]; 

i.pocf  :  set  of  (channel,  int,  int)  [initialized  to  0]; 

i.ponj  :  set  of  (AS-id,  AS-id  set,  int,  int,  int)  [initialized  to  0]; 

i.AS-path ,  i.j.AS-path , :  sequence  of  AS-ids  [initialized  to  0]; 

i.inval  :  set  of  AS-ids  [initialized  to  0]; 

i.sn, i.j.sn,  s  :  int  [initialized  to  0]; 

i.SN  :  set  of  (AS-id,  int,  int)  [initialized  to  0]; 

r'  :  sequence  of  AS-ids;  n,  k' ,  k"  :  AS-id; 

e  :  link  (i.e.,  pair  of  AS-ids);  l  :  (link,  int,  int)  or  (  (AS-id,  AS-ids),  int,  int); 
t ,  t'  ,wt  :  time; 

i.  change,  i.seqChg,i. suspect, i.adv,i.pAdv,i.advd  :  boolean  [initialized  to  false]; 

Action 

FAULT-INFO 

o 

ROUTE- ADAPT 

[] 

ADV-RESET 


Figure  2:  G-BGP:  improve  BGP  convergence  stability  and  speed 


(Al)  ::  i  changes  routing  policy  - > 

if  removes  J  from  i.im 
if  removes  J  from  i.ex  - 


fi 


»  i.pocw  A  change  :=  i.pocw  U  {(j,  i.AS,  i.sn)  '■  j  £  J},  true  fi 
i.pocw  :=  i.pocw  U  {(i.AS,  j,  i.j.sn  +  Dsn  :  j  E  J}; 
send  UPDATE(WD(i),i-pocw,  0,  0,  0,  0)  to  J 


(A3) 


if  i  changes  ranking  policy  — >  i. change  :=  true  fi 

o 

(A2)  ::  (j  =  nHop(i)  A  link  ( j,i )  fail-stops)  V  (i  =  d  Ai  withdraws  its  address  prefix)  — » 

if  channel  ( j.AS,i-AS )  fail-stops  — >  i.pocf  ,i-seqChg  :=  i.pocf  U  {((j.AS,  i.AS),  detectionTime,  i.sn)},  true; 

[]  i  withdraws  its  address  prefix  — ►  i.poas,  i.seqChg  :=  i.poas  U  {(i.AS,  i.sn)},  true 

fi 

if -i i. change  — >  i. change,  t  :=  true,  CLOCK  fi 

rev  U PDATE  m(r,  r.pocw,  r.posw,r.poas,  r.pocf,  r.ponj,  r.sn)  from  j  — » 

(V/  :  l  E  ( r.pocf  U  r.ponj)  :  l.tPsd  :=  l.tPsd  +  L d); 

|f  r.sn  >  i.j.sn  — >  i.j.sn  :=  r.sn  fi; 
if  m  is  not  a  withdrawal-U PDATE  — > 
i.j.AS-path  :=  r; 

if  -i Obsolete(r,i)  — >  i.inval  :=  i.inval  \  {j}; 

i.poas,  i.pocf,  i.pocw,  i.posw  :=  D(i.poas,  r),  D(i.pocf,  r),  D(i.pocw,  r),  D(i.posw,  r); 
[]  Obsolete(r,i)  — >  i.inval  :=  i.inval  U  {j} 

if 

[]  mis  a  withdrawal-U  P  DATE  — >  i.j.AS-path  :=  0 

fi 

i.SN , i.sn,  i.seqChg  :=  adpt(i.SN,r),adpt(i.sn,r,j),chgSeq(i,r); 
if  i. AS-path  ^  0  — > 

i.poas,  i.pocf,  i.pocw  :=  M (i.poas,  r.poas),  M (i.pocf,  r.pocf ),  M (i.pocw ,  r.pocw); 
i.posw,  i.ponj  :=  M(i.posw,r.posw),  M (i.ponj,  r.ponj) 

fi 

(Vfc;,  k" ,  s,  l  :  ((k' ,  k"),  s)  E  i.pocw  A  l  E  i.ponj  A  k"  E  l.ASes  A  s  >  l.k" .sn  :  l.ASes  :=  l.ASes  \  {(k" ,  l.k" .sn)}); 
if  -i i. change  — >  i. change,  t  \=  true,  CLOCK  fi 


Figure  3:  Module  FAULT-INFO 
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(A4)  ::  i. change  — » 

(VZ  :  l  G  ( i.pocf  \J  i.ponj )  :  l.tPsd  :=  l.tPsdA  ( CLOCK  —  t)/a); 

i.inval  :=  ( i.inval  U  {j  :  Invalid(i.j.AS-path,i)})  \  {j  :  —> Invalid(i.j.AS-path,i)}; 

if  mPref(i)  t^_L— >  i.AS-path,  i.adv  :=  [i.AS,mPref(i)\,  false; 

if  (3k'  :  S aspect (k' ,i))  A  (-^i. suspect  V  modified(i))  —> 

i. suspect,  t,wt,  i.pAdv  :=  true,CLOCK,max.{0,2a  x  maxW ait(i)} ,  true; 
[]  -i (3k'  :  S aspect (k' ,i))  — >  i.adv  :=  true 
fi 

[]  mPref(i)  =_[_—>  i.  AS  -path,  i.adv  :=  (I),  true 

fi 


o 

(A5)  :: 


i.  suspect  — > 

|f  CLOCK  <  t  A  wt  A  Invalid(i.AS-path,  i)  — >  i. suspect  :=  false; 

[]  CLOCK  >  t  A  wt  A  -i Invalid(i.AS-path ,  i)  — > 
i. suspect  false; 

(\/k' ,  k" ,  Z/,  s  :  k'  £  i.AS-path  A  ((A/,  k"),t' ,  s)  G  i.pocf  : 

i.pocw, i.pocf  :=  i.pocw  U  {((A/,  A/'),  s)},  i.pocf  \  {((A;',  k"),t' ,  s)}); 


fi 


Figure  4:  Module  ROUTE-ADAPT 


(A6)  ::  (i.adv  V  i.pAdv)  A  mrai(i)  - » 

|f  i.pAdv  A  -i i.advd  — >  i.pAdv, i.advd  :=  false,  true; 

send  U  PD  ATE(W  D(i), i.pocw, i.posw, i.poas, i.pocf , i.ponj, i.sn)  to  (i.e#  U  {preN  H  op(i)}); 

[]  i.adv  — > 

if  diff(i)  A  i.AS-path  7^  0  — > 

if  nHop(i)  7^  preN Hop(i)  A  preN Hop(i)  ^  i.inval  — > 
if  channel  (preN H op(i) ,  i.AS)  is  not  used  — > 

i.pocw,  i.seqChg  :=  i.pocw  U  {((pre7Vii/op(i),  i.AS),  i.sn)},  true 
[]  channel  (preN H op(i) ,  i.AS)  is  still  used  — > 

i.posw,  i.seqChg  \=  i.posw  U  {(§(2),  §'(2),  i. AS, preN Hop(i),  i.AS,  i,  0,  i.sn)},  true 
fi 
fi 

if  preN  Hop(i)  =_L  — ►  i.ponj,  i.seqChg  :=  i.ponj  U  addPoj(i),  true  fi 

send  U  PD  ATE  (i.AS -path,  i.pocw,  i.posw,  i.poas,  i.pocf ,  i.ponj,  i.sn)  to  (i.ex  U  {nHop(i)}); 

[]  diff(i)  A  i.AS-path  =  0  — > 

send  UPD ATE(W D(i),  i.pocw,  i,posw,  i.poas,  i.pocf ,  i.ponj,  i.sn)  to  (i.ex  U  { nHop(i )}) 

fi 


fi 


if  i.seqChg  — >  i.sn  :=  i.sn  A  1  fi 

i.poas,  i.pocf,  i.pocw,  i.ponj,  i.sn  :=  0,  0,  0,  0,  i.sn  A  1; 

i. change,  i.seqChg,  i.  suspect,  i.adv,  i.pAdv,  i.advd  :=  false,  false,  false,  false,  false,  false 


0 

(A7)  ::  i.SJV  ^  0 - »  (Vfc,  s,  t  :  (it,  s,  t)  G  i.STV  A  CLOCK  >t  +  Td  :  i.STV  :=  i.SW  \  «fc,  a, «)}) 


Figure  5:  Module  ADV-RESET 
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•  i.pocw,  i.posw,  i.poas,  i.pocf,  and  i.ponj  denote  the  point(s)  of  AS-failure,  point(s)  of  channel- 
failure,  point(s)  of  channel-withdrawal,  point(s)  of  join-change,  and  point(s)  of  segment-withdrawal 
that  i  has  respectively. 

•  i.AS-path  denotes  the  current  route  of  i,  and  i.inval  denote  the  set  of  import  neighbors  of  i  whose 
routes  are  invalid. 

•  i.sn  denotes  the  local  sequence  number  of  i,  and  i.SN  contains  the  sequence  number  of  the  latest 
information  about  faults  with  respect  to  other  ASes  that  has  reached  i.  An  element  in  i.SN  that 
records  the  sequence  number  for  the  latest  information  about  faults  with  respect  to  an  AS  j  is  deleted 
from  i.SN,  if  i  has  not  received  from  j  any  information  about  faults  for  any  period  of  T,\  time. 

•  i.change  denotes  whether  the  network  state  has  changed,  i.seqChg  denotes  whether  the  sequence 
number  of  i  needs  to  increase  after  i  adapts  its  route  to  network  state  changes;  i. suspect  denotes 
whether  i  is  in  the  process  of  resolving  some  uncertainty,  i.adv  denotes  whether  i  is  going  to  send  out 
UPDATE  messages,  i.pAdv  denotes  whether  i  will  send  out  an  withdrawal-UPDATE  message  that 
piggybacks  information  about  faults  and/or  route  changes  before  i  tries  to  resolve  some  uncertainty, 
and  i.advd  denotes  whether  i  has  sent  out  a  withdrawal-UPDATE  after  i.pAdv  is  set  to  true. 

i.pocw,  i.posw,  i.poas,  i.pocf,  i.ponj,  i.AS-path,  i.inval,  and  i.SN  are  initialized  to  0;  i.change, 
i.seqChg,  i. suspect,  i.adv,  and  i.pAdv  are  initialized  to  false. 

Moreover,  for  every  neighboring  node/AS  j  of  i,  i  uses  i.j.AS-path  and  i.j.sn  to  maintain  a  local  copy 
of  the  value  of  j.AS-path  and  j.sn  respectively.  For  convenience,  i.im  and  i.ex  arc  used  to  denote  the  set 
of  import  neighboring  ASes  and  the  set  of  export  neighboring  ASes  of  the  AS  of  i;  temporary  variables  r', 
n,  k' ,  k",  e,  l,  s,  t,  t' ,  and  wt  are  also  used. 
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Actions.  For  clarity  of  presentation,  we  define  the  following  notations: 


WD(i) 
nHop(j ,  r) 
nHop(i) 
preRoute(i) 
preN  Hop(i) 
CLOCK 
Invalid(r ,  z) 


l.gAS 

l.ASes 

l.L 

l.sn  or  l.k.sn 
l.k.sn 
l.tPsd 
IN(l,r) 
Obsolete(r ,  z) 


D(i.poas ,  r) 
D(i.pocf,r) 
D(i.pocw,r) 
D(i.posw,r) 
adpt(i.SN,  r) 


adpt(i.sn,  r,j) 

chgSeq(i ,  r) 
i.k.SN 

M  (i.poas ,  r.poas ) 
M  (i.pocf ,  r.pocf) 

M(i.pocw ,  r.pocw) 
M  (i.posw,  r.posw) 
M  (i.ponj ,  r.ponj) 


mPref(i) 

nsd(nHop(i),  l) 
Suspect(k' ,  z) 


modified(i) 
MRAI 
Pol(k',i) 
Poj(k',i) 
hops(k' ,  z) 
wtPol(k' ,  z) 
wtPoj(k' ,  z) 
maxW  ait  (i) 
adv!nfo(i) 


mrai(i) 

dif  f  (*) 

S(i) 

S'(i) 

addPoj  (i) 


used  in  a  BGP  UPDATE  message  to  withdraw  the  last  route  z  advertised  to  its  export  neighbors; 
the  AS  in  route  r*  that  is  one-hop  closer  to  d  than  j  is  in  r,  i.e.,  the  next  hop  of  j  in  route  r; 
nHop(i ,  i.AS-path ); 

the  last  route  used  by  z  before  it  adopts  its  current  route; 

nH  op(i,  preRoute(i)) ,  If  preRoute(i)  =  0  (i.e.,  z  has  no  route  previously),  preNHop(i)  =  _L; 
the  current  value  of  system  clock  at  an  AS; 

r  =  0  V  (3n  inGrAnG  i.poas)  V  (3e  :  e  G  r  A  e  G  ( LK  (i.pocf )  U  LK(i.pocw)  U  LK  (i.posw)), 
where  LK(i.pocf)  =  {(a,  b)  :  ((a,  b),t')  £  z.poc/},  LK(i.pocw)  =  {(a,  £>)  :  ((a,  6),  s')  £  z.poczt;}, 

LK  (i.posw)  =  {(a,  6)  :  (§,  S',  b,  a,  z',  t,  s)  £  i.posw} 
k  if  l  is  (/c,  s),  ((A/,  &),  t,  s),  ((k' ,  k),s),  (§,  k,  k' ,  z',  f,  s),  s),  or  (fc,  ASes,  t ,  s); 

ASes  if  l  is  a  point  of  segment-withdrawal  (ASes' ,  ASes,  X,  J ,  i' ,t' ,  s)  or  a  point  of  node-join  (rz,  ASes,  t' ,s); 

(k' ,  k)  if  l  is  ((k' ,  k),t,  s)  or  ((k' ,  k),s); 
s  if  l  is  (k,  s),  {(k' ,  k),  t,  s),  ((k' ,  k),s),  or  (k,  ASes,  t,  s); 
s  if  l  is  (v,  ASes,  t,  s')  and  (k,  s)  £  l.ASes  ; 

t'  if  l  is  a  point  of  channel-failure  ((a,  b),t' ,  s)  or  a  point  of  node-join  (n,  ASes,  t' ,s); 
l  £  r.poas  V  l  £  r.pocf  V  l  £  r.pocw  V  l  £  r.ponj  ; 

(3j,  s  :  (j,  s)  £  i.SN  A  (31  :  IN(l,  r)  A  l.gAS  =  j  A  l.j.sn  <  s))  V 
(3k,  j  :  k  £  i.inval  A  j  £  r  A  j  £  k.AS-path  A  nHop(j,  k.AS-path)  ^  nHop(j,r)  A 
-.( 3/  :  IN(l,r)  A  l.gAS  =  j)); 
i.poas  \  {(k,  s)  :  (k,  s)  £  i.poas  A  k  £  r}; 
i.pocf  \  {((k,  k'),  t,  s)  :  ((k,  k'),t,  s)  £  i.pocf  A  (k,  k')  £  r}; 
i.pocw  \  {((k,  k'),  s)  :  ((k,  k'),  s)  £  i.pocw  A  (k,  k')  £  r*}; 
i.posw  \  {(§,  S',  k,  k' ,  i' ,t,  s)  :  (S,  S',  k,  k' ,  i' ,t,  s)  £  i.posw  A  (k,  k')  £  r}; 

{(k,  s,  t)  :  (k,  s,  t)  £  i.SN  A  —> (31  :  IN(l,  r)  A  l.k.sn  >  s)}  U 

{(Aj,  l.k.sn,  CLOCK)  :  (k,  s,  t)  £  i.SN  A  IN(l,  r)  A  s  <  l.k.sn}  U 

{(l.gAS,  l.(l.gAS).sn,  CLOCK)  :  IN(l,  r)  A  ~'(3s,  t  :  (l.gAS,  s,t)  £  i.SN)}-, 

max{z. sn,  maxs{s  :  (( j,i),s )  £  r.pocw},  maxs{s  :  (j,  ASes,t,  s)  £  r.ponj  A  ( i,s ')  £  ASes}}; 

(note:  maxs{s  :  FALSE}  =  — oo) 

(3s,  l  :  ((j,  i),s)  £  r.pocw  V  (l  £  r.ponj  A  <  i,s  >£  l.ASes)) 
s  if  (k,  s,  t)  £  i.SN‘, 

{l  :  l  £  (i.poas  U  r.poas)  A  l.sn  =  i.(l.gAS).SN}-, 

{l  :  l.sn  =  i.(l.gAS).SN  A  l  £  (i.pocf  U  r.pocf)  A  —> (3lr ,1"  :  l'  £  i.pocf  A  l"  £  r.pocf  A  V  ^  l"  A  V .L  =  l" .L  =  l.L)} U 
{(e,  max{tl,  t2},  s)  :  l'  £  i.pocf  A  l"  £  r.pocf  A  e  =  V .L  =  l" .L  A  l’ .sn  =  l" .sn  =  i.(l' .gAS).SN} 

{l  :  l  £  (i.pocw  U  r.pocw)  A  l.sn  =  i.(l.gAS).SN}; 

{l  :  l  £  (i.posw  U  r.posw)  A  l.sn  =  i.(l.gAS).SN}; 

{l  :  l  £  (i.ponj  U  r.ponj)  A  l.sn  =  i.(l.gAS).SN  A  —>(3l'  :  l'  £  (i.ponj  U  r.ponj)  A  l' .gAS  =  /.p,AS)}U 

{/  :  l  £  (i.ponj  U  r.ponj)  A  l.sn  =  i.(l.gAS).SN  A  (31'  :  l'  £  (i.ponj  U  r.ponj)  A  l' .gAS  =  l.gAS  A  l' .sn  <  l.sn)}U 

{(n,  l.ASes  D  l' .  ASes,  max{Z.  tPsd,  l'  .tPsd},  l.sn)  :  l  £  i.ponj  A  l'  £  r.ponj  A 

l.gAS  =  l' .gAS  —  n  A  l.sn  =  l' .sn  =  z.n.STV} 
the  route  of  an  import  neighbor  of  z  that  is  valid  and  highest  ranked  at  z; 
mPref(i)  =  _L  if  none  of  the  import  neighbor  of  z  has  a  valid  route; 
node  nHop(i)  does  not  send  the  fault  information  l  to  z; 

k'  £  i.AS-path  A  ((3k" ,  t'  :  ((k' ,  k"),  t')  £  i.pocf)  V  (3/,  s,s'  :  l  £  i.ponj  A  (k' ,  s)  £  l.ASes  A  l.gAS  i.AS-pathA 

nsd(nHop(i),  l))  V  (31  :  l  =  (S,  z',  t ,  s)  A  l  £  i.posw  A  k'  £  S'  A  [A/,X,  J7]  £  mPref(i)  A  nsd(nHop(i),  l))), 

i.e.,  Suspect(k' ,  i)  =  true  if  k'  is  a  suspected  AS  for  z; 

variable  i.AS-path  is  modified  the  current  instance  of  A4  execution; 

the  MRAI  timer  used  in  BGP; 

a  point  of  channel-failure  ((k' ,n),t))  £  i.pocf  such  that  t  =  ma x{l.tPsd  :  l  £  i.pocf  A  (3k" ,  t'  :  ((k' ,  k"),t')  =  /)}; 

a  point  of  node-join  l'  such  that  k'  £  l' .ASes  and  l' .tPsd  =  max{Z. tPsd  :  l  £  i.ponj  A  k'  £  l.ASes}-, 

the  number  of  hops  between  k'  and  z  in  i.AS-path 

hops(k' ,nHop(i))  x  (Tm  +  Ud)  —  Pol(k' ,i).tPsd 

hops(k' ,i)  x  (Tm  +  Ud)  —  Poj(k' ,i).tPsd 

max{ma x{wtPol(k' ,i)  :  Suspect(k' ,i)  A  Pol(k' ,i)  ^_L},  ma,x.{wtPoj(k' ,  i)  :  Suspect(k' ,i)  A  Poj(k' ,i)  t^-L}} 
i. suspect  A  Invalid(preRoute(i) ,  i)  A 

-i (3k'  :  k'  £  preRoute(i)  A  ((Suspect(k' ,  i)  A  wtPol(k' ,i)  =  maxW ait(i))\/ 

(3k"  :  Suspect(k"  ,i)  A  k'  £  Poj(k"  ,i).ASes  A  wtPoj(k"  ,i)  =  maxW  ait  (i)))) 
i  did  not  send  any  UPDATE  message  within  the  past  MRAI  time; 
i.AS-path  differs  from  the  last  route  z  has  used; 

the  set  of  export  neighboring  ASes  of  z  where  there  is  no  node  whose  route  goes  through  [i.AS,preNhop(i)\; 
the  set  of  export  neighboring  ASes  of  z  where  there  may  be  no  node  whose  route  goes  through  [i.AS,preNhop(i)]-, 

{(i.AS,  j,i,0,i.sn)  :  j  £i.exAj  ^  nHop(i)}. 
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G-BGP  consists  of  three  submodules:  FAULT-INFO,  ROUTE-ADAPT,  and  ADV-RESET,  which  are 
shown  in  Figure  3,  4,  and  5  respectively: 

•  FAULT-INFO  consists  of  actions  ,41,  ,42,  and  A3  that  generate  and  propagate  information  about 
faults  and  network  state  changes,  and  it  implements  the  ideas  of  numbered-grapevining  and  obsolete 
information  removal  as  discussed  in  Section  3.2. 1  and  3.2.2. 

•  ROUTE-ADAPT  consists  of  actions  A4  and  ,45  that  adapt  the  routes  of  ASes  according  to  faults  and 
state  changes,  and  it  implements  the  ideas  of  uncertainty  resolution  as  discussed  in  Section  3.2.3. 

•  ADV-RESET  consists  of  actions  A6  and  ,47  that  send  out  BGP  UPDATE  messages  and  reset  protocol 
variables. 

3.4  Example  revisited 

We  revisit  an  example  discussed  in  Section  3.1  and  see  how  the  network  will  behave  if  G-BGP  is  used. 
If  a  fail-stops  when  the  network  is  at  the  state  q  as  shown  in  Figure  1,  b  will  detect  the  fail-stop  of  (6,  a) 
and  generate  a  point  of  channel-failure  ([6,  a],t).  ([6,  a],t)  is  piggybacked  with  UPDATE  messages  and 
propagated  towards  g.  When  g  receives  the  route-withdrawal  UPDATE  message  from  m,  g  will  learn, 
via  ([b,  a\,t),  the  fail-stop  of  (6,  a)  and  will  not  adopt  f.  b.  a.  d\,  even  if  /  has  not  withdrawn  the  route. 
Moreover,  since  route  \j,  h,  a ,  d]  goes  through  the  suspected  node  a,  g  will  resolve  the  uncertainty  regarding 
the  validity  of  \j,h,a,d\.  By  the  uncertainty  resolution,  j  will  regal'd  \j,h,a,d\  as  invalid  (possibly  well 
before  j  withdraws  [j,  h,  a.  d],  since  the  uncertainty  resolution  is  based  on  information  flow  speed  that  is 
not  subject  to  the  MRAI  timer  control).  Then  g  changes  its  route  directly  to  [c,  w,  d].  Therefore,  there  is  no 
instability  or  instability  propagation  during  the  convergence. 

4  Policy  graph:  concepts  and  properties 

In  this  section,  we  first  define  the  concept  of  policy  graph  for  modelling  inter-AS  routing,  then  we  present 
some  properties  of  policy  graph. 

4.1  Concepts 

In  inter-AS  routing,  both  network  topology  and  export  as  well  as  import  policies  of  nodes  affect  the  routes 
available  in  a  network.  We  define  the  concept  of  policy  graph  to  model  the  above  three  aspects  of  a  network. 
Policy  graph  is  used  to  analyze  the  convergence  properties  of  G-BGP  and  BGP  in  Section  5. 

Given  a  state  q  of  a  network  G  and  the  destination  d  in  V.q,  the  policy  graph  at  state  q,  denoted  by  Gp.q, 
is  a  directed  graph  (Vp.q,  Ep.q),  where 

Vp.q  =  {i  :  i  £  V.q  A  (3j  :  (j,  i)  £  E.q  A  j  exports  route  to  i  A  i  imports  route  from  j)} 

Ep.q  =  {(j,  i)  :  j  £  Vp.q  A  i  £  Vp.q  A  j  exports  route  to  i  A  i  imports  route  from  j} 

4.2  Properties  of  policy  graph 

In  this  subsection,  we  present  the  complexity  of  computing  the  policy  graph  at  a  state  q  and  an  observation 
of  the  structure  of  policy  graph. 
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Computational  complexity.  To  compute  the  policy  graph  Gp.q(Vp.q,  Ep.q)  for  a  network  G  and  destination 
d  at  state  q,  we  use  the  breadth  first  graph  search  algorithm  [7] :  the  search  starts  with  d  and  visits  every  node 
in  G.q  one  by  one;  when  the  searching  process  visits  a  node  i,  i  is  added  to  Vp.q,  the  export  neighbors  of  i 
in  the  policy  graph  (i.e.,  EX{i ,  Gp.q ))  arc  calculated  by  checking  the  routing  policy  of  i  and  its  neighbors’ 
in  G.q ,  the  set  of  edges  {(i,j)  :  j  €  EX(i,  Gp.q )}  is  added  to  Ep.q,  and  every  node  in  EX{i ,  Gp.q)  that 
has  not  been  visited  is  added  to  the  list  of  nodes  to  be  visited  (see  Figure  6  for  detailed  description  of  the 
algorithm).  Since  the  breadth  first  graph  exploration  of  G.q  takes  ()(\V.q\  +  \E.q\)  time,  the  above  procedure 


Policy-graph(G.g,  d.  P.q ) 

do  each  j  €  V.q  — > 
j.  col  or  :=  white ; 

od 

Q  ==  M; 

Vp.q,  Ep.q  :=  0,0; 
do  Q  ^  0  — » 
j  :=  Q.head ; 

Vp.q  :=  Vp.qU{j}-, 

do  each  i  such  that  ( j ,  i )  €  E.q  — » 

if  j  exports  its  route  toi  f\i  imports  routes  from  j  — > 

Ep.q  :=  Ep.q  U  {(j,  «)}; 

if  i. color  =  white —>  i.  col  or  :=  black',  Enqueue(Q ,  i)  fi 

|i 

od 

Dequeue(Q ); 

od 

return  (Vp.q,  Ep.q) 


Figure  6:  Algorithm  to  compute  policy  graph  Gp.q(Vp.q,  Ep.q) 


to  compute  policy  graph  takes  0{\V.q\  +  \E.q\)  time  too6.  This  result  is  formalized  in  Proposition  1. 

As  for  complexity  in  computing  a  policy  graph,  we  have 

Lemma  1  (Complexity  of  computing  policy  graph)  It  takes  0(\V.q\  +  \E.q\)  time  to  compute  the  policy 
graph  Gp.q{Vp.q ,  Ep.q )  at  a  network  state  q. 

This  is  in  contrast  to  the  exponential  computational  complexity  for  dispute  digraph,  which  is  used  in 
[12]  and  [22], 

Structural  property.  The  structure  of  an  policy  graph  depends  on  the  network  topology  and  routing  policy 
adopted  at  each  AS.  We  analyze  them  as  follows. 

Due  to  historical  and  commercial  reasons,  the  Internet  topology  is  a  hierarchical  one  with  meshed  inter¬ 
connections  among  entities  at  various  tiers  [16,  9,  18].  On  one  hand,  different  types  of  ISPs,  such  as  local 
ISPs,  regional  ISPs,  national  ISPs,  and  transit  (or  international)  ISPs,  with  customer-provider  relationship 
provide  network  infrastructures  to  form  the  Internet,  and  the  resulted  Internet  takes  the  form  of  a  hierarchy 
of  tiers  in  the  sense  that  networks  or  ASes  of  local  ISPs  attach  to  those  of  its  regional  ISPs,  ASes  of  regional 


('In  contrast,  it  takes  exponential  time  to  compute  the  dispute  digraph  which  is  used  to  analyze  convergence  speed  of  BGP  in 
[22], 
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ISPs  attach  those  of  its  national  ISPs,  and  ASes  of  national  ISPs  attach  to  those  of  transit  ISPs.  Thus,  the 
tiers  of  the  Internet  hierarchy  from  lower  to  higher  tiers  arc  ASes  of  local  ISPs,  regional  ISPs,  national  ISPs, 
and  transit  ISPs.  On  the  other  hand,  due  to  business  pressure,  ASes  of  ISPs  form  a  peering  relationship 
with  those  of  other  ISPs  at  their  geographical  neighborhood  to  provide  transit  service  for  one  another7 ,  thus 
a  rich  mesh  of  interconnection  at  various  tiers  (i.e.,  local,  regional,  national,  and  transit  ISPs)  exists  [16]. 
Therefore,  the  policy  graphs  for  the  Internet  is  a  hierarchical  one  with  a  rich  mesh  of  interconnection  at  each 
tier  of  the  hierarchy. 

Moreover,  [18]  found  that  lower  tier  ISPs  tend  to  possess  a  higher  degree  of  peering  interconnectivity 
than  higher  tier  ISPs,  which  means  that  the  peering-meshes  among  lower  tier  ISPs  tend  to  be  more  con¬ 
nected  than  those  among  higher  tier  ISPs.  However,  [11]  found  that  the  average  degree  of  AS-level  Internet 
topology  is  small8  and  is  between  2.6  and  2.9.  Therefore,  the  average  degree  of  AS  is  small,  and  the  higher 
an  AS  is  in  the  Internet  hierarchy  the  smaller  its  degree  tends  to  be.  And  these  properties  hold  for  the  policy 
graphs  of  the  Internet  too. 

In  terms  of  routing  policy,  most  ASes  today  import  every  route  they  hear  from  its  neighboring  ASes 
and  do  not  impose  any  filtering  [18].  Export  policy  adopted  at  each  AS  depends  on  its  relationship  with 
its  neighboring  ASes:  an  AS  i  exports  to  its  provider  ASes  and  peering  ASes  only  the  set  of  routes  that 
either  belong  to  i  or  are  received  from  the  customer  ASes  of  ?;  i  exports  every  route  it  knows  to  its  customer 
ASes  [19].  Therefore,  given  the  destination  d,  the  policy  graphs  of  the  Internet  are  directed  hierarchical 
ones:  starting  at  d,  the  directed  edges  go  upwards  first  to  reach  the  provider  ASes  PI  of  d,  then  to  provider 
ASes  of  PI,  and  so  on  until  reaching  the  transit  ASes  T;  then  the  directed  edges  go  downwards  to  reach  the 
customer  ASes  CT  of  T,  then  to  the  customer  ASes  of  CT,  and  so  on  until  reaching  the  local  ASes  L;  for 
the  set  of  ASes  that  arc  either  direct  or  indirect  providers  of  d,  there  arc  bidirectional  edges  between  peers  or 
customer-provider  pairs;  for  the  set  of  ASes  that  arc  neither  direct  nor  indirect  providers  of  d,  usually  there 
is  no  edge  between  peers  and  only  directed  edges  from  a  provider  to  its  customers. 

Therefore,  the  policy  graphs  of  the  Internet  arc  directed  hierarchical  ones  with  meshed  interconnections 
at  various  tiers  of  the  hierarchy,  and  the  average  degree  of  ASes  in  the  meshes  of  lower  tiers  tend  to  be  larger 
than  that  of  higher  tiers.  An  example  3-tier  policy  graph  is  shown  in  Figure  7. 

The  above  observations  are  formalized  in  Proposition  1  as  follows. 

Proposition  1  (Directed  hierarchy  of  policy  graph)  The  policy  graphs  of  the  Internet  are  directed  hierar¬ 
chical  ones  with  meshed  interconnections  at  various  tiers  of  the  hierarchy,  and  the  average  degree  of  ASes 
in  the  meshes  of  lower  tiers  tend  to  be  larger  than  that  of  h  igher  tiers. 

7This  is  usually  achieved  through  Network  Access  Points  (NAPs),  which  are  also  referred  to  as  Commercial  Internet  Exchanges 
(CIXs),  Metropolitan  Area  Exchanges  (MAEs),  or  Federal  Internet  Exchanges  (FIXs)  according  to  contexts. 

sMore  specifically,  87%  of  ASes  have  degrees  between  1  and  3,  9%  of  ASes  have  degrees  between  4  and  9,  3.1%  ASes  have 
degrees  between  10  and  27,  and  0.9%  of  ASes  have  degrees  larger  than  28. 
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5  Analysis  of  G-BGP 

Given  a  network  topology  G'(V',  E')  and  a  set  P'  of  routing  policies  for  nodes  in  V' ,  we  let  £  =  (Vi  :  i  G 
V'  =>  LH.i),  where  LH.i  is  defined  as 

(i  =  i.AS-path  =  0)  A 

(i  f  d  =>  ((d  i.AS-path  =  ( i.AS,mPref(i )))  A  (d  ^  V'  =>  i.AS-path  =  0))) 

where 

mPref(i)  =  the  highest  ranked  candidate  route  of  ?. 

Then,  every  state  in  £  is  a  state  where  each  node  in  V'  has  chosen  its  highest-ranked  candidate  route,  and 
every  state  in  £  is  a  stable  state  of  G-BGP  where  no  action  of  G-BGP  is  enabled. 

In  the  presence  of  the  faults  discussed  in  Section  2,  three  events  can  occur  in  a  network:  TdOWm  Tup,  and 
Tchange •  T down  occurs  when  the  destination  d  fail-stops  (including  d  withdrawing  its  address  prefix);  Tup 
occurs  when  d  newly  joins  the  network;  and  Tchange  occurs  when  d  is  up,  but  some  node  needs  to  change 
route  as  a  result  of  some  fault.  Using  policy  graphs,  we  comparatively  study  the  convergence  properties  (i.e., 
stability  and  speed)  of  G-BGP  and  BGP  under  different  event  or  fault  scenarios;  we  also  study  the  impact  of 
route  ranking  policies. 

5.1  Convergence  stability 

In  the  case  of  TdOWn,  we  have 

Lemma  2  (Convergence  stability  after  Tfiovm)  When  T,iown  occurs,  for  both  the  SPF  and  non-SPF  poli¬ 
cies,  G-BGP  converges  with  no  fault-agnostic  or  distribution-inherent  instability;9  Both  fault-agnostic  and 
distribution-inherent  instability  can  occur  during  BGP  convergence. 

9  The  fact  that  G-BGP  converges  is  proved  in  Section  5.2. 
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Proof :  When  Tr/0?;,„  occurs,  there  arc  several  different  cases:  whether  d  announces  that  it  will  fail-stop 
before  it  actually  fail-stops,  and  whether  d  has  one  or  more  than  one  export  neighbors  before  it  fail-stops. 

We  call  the  case  of  Tdown  where  d  withdraws  its  address  prefix  or  announces  that  it  will  fail-stop  before  it 
actually  fail-stops  as  graceful  Tdown,  and  the  case  of  TdOWn  where  d  does  not  announce  that  it  will  fail-stop 
before  it  fail-stops  as  gross  Tdown. 

We  analyze  the  convergence  property  of  G-BGP  as  follows. 

•  In  the  case  of  graceful  Tdoum,  a  point  of  AS-failure  (d)  will  be  propagated  along  UPDATE  messages. 

When  an  AS  i  receives  an  UPDATE  message  with  the  point  of  AS-failure  (d),  i  will  add  (d)  to 
i.poas  by  executing  action  A3.  Since  the  current  route  and  all  the  candidate  routes  of  i  include  d,  the 
execution  of  action  .44  at  i  will  invalidate  the  current  route  and  every  candidate  route  of  i.  Therefore, 
after  an  AS  i  receives  an  UPDATE  message,  i  will  withdraw  its  route  to  d  and  set  i.AS-path  to  0  by 
executing  action  ,46.  Moreover,  when  i  withdraws  its  route  by  action  ,46,  i  also  propagates  the  point 
of  AS-failure  (d)  to  its  export  neighbors. 

The  above  situation  happens  to  every  AS  i  in  the  network.  Therefore,  all  the  UPDATE  messages  that 
arc  propagated  in  the  network  after  Tdown  arc  withdrawal-UPDATE  messages.  Therefore,  every  AS 
will  only  change  (i.e.,  withdraw)  its  route  only  once  before  the  network  converges.  Thus  no  fault- 
agnostic  or  distribution-inherent  instability  can  occur  during  G-BGP  convergence. 

•  In  the  case  of  gross  Tdown  where  d  only  has  one  export  neighbor,  a  point  of  channel-failure  ( (d.  d'),  timePassed ) 
will  be  propagated,  where  d!  is  the  only  export  neighbor  of  d.  For  every  AS  i  in  the  network,  its  current 

route  and  candidate  routes  must  all  go  through  link  Id,  dr).  Therefore,  the  convergence  behavior  of 
G-BGP  in  this  case  is  the  same  as  that  in  graceful  Tdown ,  and  no  fault-agnostic  or  distribution-inherent 
instability  can  occur  during  G-BGP  convergence. 

•  In  the  case  of  gross  TdOWn  where  d  has  multiple  export  neighbors  d'0, . . . .  d!m,  multiple  points  of 
channel-failures  ((d,  d'v) ,  timePassed)  (v  =  (),...,  rn)  will  be  propagated.  For  every  AS  i,  its  current 
route  and  candidate  routes  must  go  through  link  (d.  d'v)  for  some  v  €  [0,  ?/;].  When  i  receives  an 
UPDATE  message,  it  will  add  the  point(s)  of  channel-failures  and  point(s)  of  route-changes  carried 
in  the  UPDATE  message  to  i.pocf  and  i.pocw  respectively  by  executing  action  A3.  Then,  after 
executing  action  A4,  if  there  is  still  some  candidate  route  r3  that  has  not  been  invalidated  and  goes 
through  link  (d,  d'-),  i  will  enter  the  process  of  resolving  uncertainty  between  link-failure  and  node¬ 
failure  to  check  whether  link  (d,  d'rj )  has  fail-stopped  too.  During  the  waiting  period  of  uncertainty 
resolution  at  i,  the  point  of  channel-failure  ((d,  o?'),  timePassed)  will  reach  i  and  be  added  to  i.pocf, 
after  which  i  will  invalidate  route  r3  by  executing  action  A4.  This  process  will  continue  until  there  is 
no  valid  candidate  route  for  i  any  more,  at  which  point  i  will  withdraw  its  current  route  and  propagates 
the  set  of  points  of  channel-failures  it  has  learned  to  its  export  neighbors. 

The  above  situation  happens  to  every  AS  i  in  the  network.  Therefore,  all  the  UPDATE  messages  that 
arc  propagated  in  the  network  after  Tdown  arc  withdrawal-UPDATE  messages.  Therefore,  every  AS 
will  only  change  (i.e.,  withdraw)  its  route  only  once  before  the  network  converges.  Thus  no  fault- 
agnostic  or  distribution-inherent  instability  can  occur  during  G-BGP  convergence. 

In  BGP,  both  fault-agnostic  and  distribution-inherent  instability  may  happen  in  BGP  after  as 

discussed  in  Section  3.1  and  [17]. 
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In  the  case  of  Tup,  we  have 

Lemma  3  (Convergence  stability  after  Tup)  When  Tup  occurs, 

( i)  For  the  SPF  policy,  G-BGP  and  BGP  converges  with  no  fault-agnostic  instability.  Furthermore,  if 
message  passing  delay  is  proportional  to  the  number  of  hops  a  message  passes,  G-BGP  and  BGP 
converge  with  no  distribution-inherent  instability. 

(ii)  For  non-SPF  policies,  G-BGP  converges  with  no  fault-agnostic  instability,  but  fault-agnostic  instabil¬ 
ity  can  occur  in  BGP;  distribution-inherent  instability  may  happen  in  both  G-BGP  and  BGP. 

Proof:  In  the  case  of  the  SPF  policy,  whenever  an  AS  i  changes  its  route  in  G-BGP  or  BGP,  it  changes 
to  a  shorter  route.  Therefore,  if  i  changes  its  route  from  that  learned  from  one  of  its  import  neighbor  j 
to  that  learned  from  another  import  neighbor  j' ,  no  matter  the  current  knowledge  of  i  with  respect  to  the 
actual  AS-path  of  the  route  via  j'  is  correct  or  not,  the  route  via  j'  will  always  be  shorter  than  that  via  j 
unless  distribution-inherent  instability  occurs  at  i.  Thus,  if  distribution-inherent  instability  does  not  occur  at 
i,  the  route  i  chooses  in  the  final  stable  network  state  will  be  via  f  instead  of  j.  Therefore,  fault-agnostic 
instability  will  not  happened  in  G-BGP  and  BGP  Furthermore,  if  message  passing  delay  is  proportional  to 
the  number  of  hops  a  message  has  passed,  then  an  AS  i  always  learns  the  shortest  path  from  d  to  i  first. 
Therefore  an  AS  i  will  not  change  its  route  anymore  once  it  has  learned  a  route  which  is  a  shortest  path 
route  to  d  in  the  case  of  the  SPF  policy.  Thus,  there  is  no  instability  incurred  in  both  G-BGP  and  BGP  in 
this  case. 

We  prove  that,  if  message  passing  delay  is  not  proportional  to  the  number  of  hops  a  message  has  passed, 
distribution-inherent  instability  can  happen  in  BGP  and  G-BGP  in  the  case  of  the  SPF  policy  as  follows. 
We  consider  two  ASes  i  and  i! ,  with  i  being  farther  away  from  d  than  id  is.  Suppose  id  can  reach  d  via 
two  routes  rl  and  r2,  with  r 2  longer  than  r  1.  However,  i  learns  route  r2  earlier  than  rl  due  to  different 
delay  along  different  routes.  Later,  i  learns  from  one  of  its  import  neighbor  j  a  route  that  contains  r2  and 
sets  [i,j, . . .  ,  if  r‘2]  as  its  route  to  d.  After  this,  id  learns  rl  and  changes  its  route  to  rl,  and  thus  the  kind 
of  distribution-inherent  instability  where  the  adopted  route  is  valid  but  transient  happens  at  this  moment. 
Suppose  that  i  learns  another  route  r3  via  another  import  neighbor  j'  before  i  learns  the  route  change  at  id, 
and  that  r3  is  shorter  than  [i,  j, . . .  ,  id  r2]  but  longer  than  [i,  j, . . .  ,  id  rl].  Then  i  will  change  its  route  to  r3 
even  though  r3  is  longer  than  [i,  j, . . .  ,  id  rl].  Later,  j  informs  i  of  the  newly  learned  route  [i,  j, . . .  ,  id,  rl], 
and  i  changes  its  route  again  to  [i,  j. . . . ,  id  rl]  that  go  through  the  import  neighbor  j.  Thus  the  kind  of 
distribution-inherent  instability  where  the  adopted  route  is  invalid  happens  here. 

The  kind  of  distribution-inherent  instability  where  the  adopted  route  is  valid  but  transient  is  even  more 
likely  to  happen  in  G-BGP  and  BGP  in  the  case  of  non-SPF  policy  than  in  the  case  of  the  SPF  policy,  because 
more  preferred  route  of  an  AS  is  more  likely  to  be  formed  later  than  some  less  preferred  route  of  the  AS  in  the 
case  of  non-SPF  policy.  Since  the  kind  of  distribution-inherent  instability  where  the  adopted  route  is  invalid 
is  caused  by  fault-agnostic  instability  and  the  kind  of  distribution-inherent  instability  where  the  adopted 
route  is  valid  but  transient,  the  increase  in  the  likelihood  of  the  kind  of  distribution-inherent  instability 
where  the  adopted  route  is  valid  but  transient  also  increases  the  likelihood  of  the  kind  of  distribution-inherent 
instability  where  the  adopted  route  is  invalid. 
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In  G-BGP,  whenever  the  state  of  an  AS  i  changes  (e.g.,  changing  route,  associated  links  fail-stopping), 
information  about  the  state  change  (e.g.,  point  of  AS-failure,  point  of  channel-failure,  point  of  channel- 
withdrawal,  and  point  of  node -join)  will  be  piggybacked  in  the  first  message  that  is  sent  out  from  i.  There¬ 
fore,  when  another  AS  j  changes  its  state  due  to  the  state  change  at  i,  j  knows  the  exact  change  at  i  or 
can  resolve  certain  uncertainties,  and  will  not  use  information  that  is  invalidated  by  the  state  change  at  i. 
Thus  fault-agnostic  instability  is  avoided  in  G-BGP.  However,  this  is  not  the  case  in  BGP  One  example  is 
as  follows.  Suppose  there  exists  an  AS  i  that  has  two  import  neighbors  j  and  j'  whose  routes  go  through  the 
same  AS  k  other  than  i,  j,  and  f.  At  certain  moment  to,  i  chooses  the  route  learned  from  j  as  its  route  to  d; 
sometime  later,  k  changes  its  route  and  j  changes  the  route  it  advertised  to  i  accordingly,  then  i  may  choose 
the  route  it  previously  learned  from  j'  as  its  new  route  in  BGP,  and  fault-agnostic  instability  is  incurred. 
(This  will  not  happen  in  G-BGP  because  information  about  the  change  at  k  will  be  propagated  to  i  and  i 
will  learn  that  the  route  it  previously  received  from  j'  is  already  invalid.) 


□ 


In  the  case  of  Tchange,  we  have 

Lemma  4  (Convergence  stability  after  Tchanije)  When  Tchnnge  occurs,  for  both  the  SPF  and  non-SPF 
policies,  G-BGP  converges  with  no  fault-agnostic  instability,  but  fault-agnostic  instability  can  occur  in 
BGP  convergence;  distribution-inherent  instability  can  occur  in  G-BGP  and  BGP. 

Proof'.  In  G-BGP,  whenever  the  state  of  an  AS  i  changes  (e.g.,  changing  route,  associated  links  fail¬ 
stopping),  information  about  the  state  change  (e.g.,  point  of  AS-failure,  point  of  channel-failure,  point  of 
channel-withdrawal,  and  point  of  node -join)  will  be  piggybacked  in  the  first  message  that  is  sent  out  from 
i.  Therefore,  when  another  AS  j  changes  its  state  due  to  the  state  change  at  i,  j  knows  the  exact  change  at 
i  or  can  resolve  certain  uncertainties,  and  will  not  use  information  that  is  invalidated  by  the  state  change  at 
i.  Thus  fault-agnostic  instability  is  avoided  in  G-BGP.  However,  this  is  not  the  case  in  BGP,  as  shown  in  the 
proof  for  Theorem  3. 


□ 


Lemmas  2,  3,  and  4  imply 

Theorem  2  (fault-agnostic-instability  freedom  in  G-BGP)  When  any  of  the  events  T(iown,  Tup,  and  Tcj iange 
occurs,  G-BGP  converges  with  no  fault-agnostic  instability;  this  holds  whether  or  not  the  SPF  (or  some  non- 
SPF)  route  ranking  policy  is  used. 

Theorem  3  (Fault-agnostic  instability  in  BGP)  Fault-agnostic  instability  can  occur  during  BGP  conver¬ 
gence  in  both  the  event  ofT^own  tind  Tchange,  whether  or  not  the  SPF  (or  some  non-SPF )  route  ranking 
policy  is  used;  during  BGP  convergence  in  the  event  of  Tup,  fault-agnostic  instability  can  occur  if  some 
non-SPF  policy  is  used,  but  fault -agnostic  instability  does  not  occur  if  the  SPF  policy  is  used. 

By  Theorems  2  and  3,  we  see  that  G-BGP  eliminates  all  the  fault-agnostic  instability  that  can  occur  dur¬ 
ing  BGP  convergence.  The  elimination  of  fault-agnostic  instability  is  able  to  avoid  the  type  of  delayed  BGP 
convergence  that  is  due  to  the  mis-interaction  between  BGP  convergence  instability  and  BGP  route  flap 
damping.  Moreover,  by  eliminating  fault-agnostic  instability,  G-BGP  improves  BGP  convergence  speed 
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substantially  and  achieves  asymptotically  optimal  convergence  speed  in  several  scenarios  where  BGP  con¬ 
vergence  is  severely  delayed  (such  as  when  a  node  or  a  link  fail-stops),  as  shown  later  in  this  section  and  by 
simulation  in  Section  6. 

Furthermore,  in  the  event  of  T(iown  where  BGP  exhibits  its  worst  instability,  the  elimination  of  fault- 
agnostic  instability  in  G-BGP  also  prevents  distribution-inherent  instability  from  happening,  as  shown  by 

Theorem  4  (Distribution-inherent-instability  freedom  in  G-BGP  after  T(iovm)  G-BGP  converges  with 
no  distribution-inherent  instability  in  the  event  of  T(iovrn,  whether  or  not  the  SPF  (or  some  non-SPF)  route 
ranking  policy  is  used. 

Proof:  This  claim  holds  as  a  result  of  Lemma  2. 

□ 


We  summarize  the  stability  during  G-BGP  and  BGP  convergence  in  Table  1. 


Stability 

SPF  Policy 

Non-SPF  Policy 

FAI 

DII 

FAI 

DII 

Tdown 

G-BGP 

No 

No 

No 

No 

BGP 

Possible 

Possible 

Possible 

Possible 

T 

±  up 

G-BGP 

No 

Possible 

No 

Possible 

BGP 

No 

Possible 

Possible 

Possible 

change 

G-BGP 

No 

Possible 

No 

Possible 

BGP 

Possible 

Possible 

Possible 

Possible 

Table  1:  Stability  during  G-BGP  and  BGP  convergence.  In  the  table,  FAI  and  DII  denote  fault-agnostic  and 
distribution-inherent  instability  respectively. 


5.2  Convergence  speed 

For  convenience,  we  define  the  following  notations: 


n(i,V,q) 


V(q) 
£P(V,q) 
hops(i,J ,  q) 


maxjgy  dist(i,  j,  q ),  where  dist(i,  j,  q)  denotes  the  number  of  inter- AS  hops  in  the  shortest 
path  from  node  i  to  j  in  the  policy  graph  Gp.q,  and  each  inter- AS  hop  in  a  path  C  in  Gp.q  is 
a  maximal-length  path  segment  in  C  that  consists  of  nodes  from  the  same  AS; 
maxjey.g  lengt,h(j  .AS-path.q) ,  where  lengt,h(j.  AS-path.q)  denotes  the  number  of  inter- 
AS  hops  in  the  route  j.  AS-path.q; 

the  number  of  inter- AS  hops  in  the  longest  simple  path  in  the  “subgraph  of  Gp.q  on  the  set 
V  of  nodes”; 

the  number  of  inter-AS  hops  between  ASes  i.AS  and  J  in  the  route  i. AS-path.q. 


We  first  analyze  the  convergence  speed  of  G-BGP  and  BGP  in  the  event  of  T'down,  for  both  the  SPF 
route  ranking  policy  and  non-SPF  policies.  In  the  event  of  Tup  or  Tchange,  distribution-inherent  instability 
can  happen  during  G-BGP  convergence,  which  makes  it  difficult  to  asymptotically  compare  G-BGP  and 
BGP  convergence  speed  when  non-SPF  policies  arc  used.  Therefore,  for  the  scenario  where  event  Tup  or 
Tchange  occurs,  we  only  analyze  the  case  when  the  SPF  policy  is  used;  we  study  the  cases  when  non-SPF 
policies  are  used  via  simulation  in  Section  6. 
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In  the  event  of  Tciovrn,  we  have 

Theorem  5  (Convergence  speed  after  Tdown )  When  a  network  is  at  a  state  qo, 

(i)  If  d  fail-stops  gracefully,  or  if  d  fail-stops  grossly  when  it  has  a  single  neighboring  AS,  G-BGP  con¬ 
verges  within  6(1Z(i,  V.qo,  qo))  time,  which  is  asymptotically  optimal;  this  holds  whether  or  not  the 
SPF  ( or  some  non-SPF)  route  ranking  policy  is  used; 

If  d  fail -stops  grossly  when  it  has  multiple  neighboring  ASes,  G-BGP  converges  within  0(V(qo)) 
time,  whether  or  not  the  SPF  (or  some  non-SPF )  policy  is  used; 

(ii)  If  d  fail-stops,  it  takes  BGP  up  to  0  (CP  (V.qo,  qo))  time  to  converge  when  the  SPF  policy  is  used  and 
0(n\)  time  when  non-SPF  policy  is  used. 

(Hi)  K{i,V-qo,qo)  <  T>(q0)  <  CP(V.qo,qo). 

Proof:  In  the  case  of  Tdown •  there  arc  several  sub-cases:  whether  d  gracefully  or  grossly  fail-stops,  and 
whether  d  has  one  or  more  than  one  export  neighbors  before  it  fail-stops.  We  call  the  case  of  Tdown  where 
d  gracefully  fail-stops  as  graceful  Tdown,  and  the  case  of  Tdown  where  d  grossly  fail-stops  as  gross  Tdown- 

(Note:  given  that  intra-AS  coordination  is  quick  and  the  coordination  time  is  bounded  from  above  by 
certain  small  constant,  and  that  we  arc  interested  in  inter-AS  coordination  in  inter-AS  routing,  the  analysis 
of  the  paper  focuses  at  the  level  of  inter-AS  coordination.  Thus,  for  conciseness,  the  unit  of  consideration  in 
our  analysis  is  by  an  AS  instead  of  a  node.  ) 

We  first  prove  the  convergence  properties  of  G-BGP.  As  discussed  in  the  proof  for  Theorem  2,  all  the 
UPDATE  messages  arc  withdrawal-UPDATE  messages  in  G-BGP  when  Tdown  occurs.  Therefore,  once  an 
AS  i  withdraws  its  route  at  some  time,  it  will  not  change  its  route  again.  Thus,  to  deduce  the  time  taken  for 
G-BGP  to  converge  to  a  state  in  £,  we  only  need  to  deduce  the  time  taken  for  the  last  AS  to  withdraw  its 
route  to  d  after  d  fail-stops. 

•  In  the  case  of  gross  Tdown  where  d  has  a  single  export  neighbor  d! ,  d'  detects  the  fail-stop  of  link 
(d,  d')  and  withdraws  its  route  to  d  since  d'  has  no  other  candidate  route  to  d.  Then,  d!  sends  out  an 
point  of  channel-failure  ((d,  d'),  time  Passed)  which  is  piggybacked  in  every  UPDATE  message.  An 
AS  i  other  than  d'  withdraws  its  route  once  it  receives  an  UPDATE  message,  since  the  current  route 
of  i  and  all  its  candidate  routes  go  through  link  id,  d').  Then,  the  time  taken  for  G-BGP  to  converge 
in  this  case  depends  on  the  time  taken  for  the  last  AS  to  first  receive  an  UPDATE  message. 

If  the  number  of  hops  in  the  shortest  path  from  d  to  an  AS  i  in  the  policy  graph  Gp.qo  is  l,  and  /,  >  2 
(note:  if  lt  <  1,  then  i  is  either  d  or  d!  that  does  not  receive  any  UPDATE  message),  then  the  time 
taken  for  i  to  first  receive  an  UPDATE  message  is  0(h).  We  prove  this  claim  by  induction  on  It  as 
follows: 

-  Base:  the  claim  trivially  holds  when  /,  =  2,  since  every  AS  with  /,  being  2  is  an  export  neighbor 
of  d'. 

-  Hypothesis:  the  claim  holds  when  /,  =  l. 

-  Induction:  for  every  AS  i  with  ld  being  Z  -E  1,  it  must  have  an  import  neighbor  j  with  lj  being  l. 
By  hypothesis,  j  must  have  withdrawn  its  route  and  sends  out  an  UPDATE  message  including 
the  point  of  channel-failure  within  0(1)  time.  Since  i  will  receive  the  UPDATE  message  within 
Ud  time  after  the  message  is  sent  out  from  j,  i  will  receive  the  UPDATE  message  within  0(1  + 1), 
i.e.,  0(h),  time.  Therefore,  the  claim  holds  when  h  =  l  +  1. 
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Since  the  maximum  hops  in  the  shortest  path  from  d  to  an  AS  i  in  the  policy  graph  Gp.qo  is  7 Z(i,  V.qo,qo), 
the  time  taken  for  G-BGP  to  converge  in  this  case  is  9(lZ(i ,  V.qo ,  go)). 

Moreover,  given  that  it  takes  0(7Z(i.  V.qo ,  go))  time  for  any  information  to  travel  from  d  to  the  node 
that  is  farthest  from  d,  9(lZ(i,  V.qo,  go))  is  the  lower  bound  on  the  convergence  time  of  any  stateful 
routing  protocols  in  the  event  of  T^0U)n.  Thus  OCRii.  V.qo,  go))  is  optimal  convergence  time  achiev¬ 
able,  and  G-BGP  convergence  in  an  asymptotically  optimal  manner. 

•  In  the  case  of  graceful  Triovm,  a  point  of  AS-failure  (d)  is  piggybacked  in  every  UPDATE  message. 

An  AS  i  other  than  d  and  the  export  neighbors  of  d  withdraws  its  route  once  it  receives  an  UPDATE 
message,  since  the  current  route  of  i  and  all  its  candidate  routes  go  through  AS  (d).  Then,  the  time 
taken  for  G-BGP  to  converge  in  this  case  depends  on  the  time  taken  for  the  last  AS  to  first  receive 
an  UPDATE  message.  This  is  the  same  as  in  the  case  of  gross  T,iow„  where  d  has  a  single  export 
neighbor  d' .  Thus,  G-BGP  converges  within  OCRii,  V.qo,  go))  time  in  this  case  too. 

•  In  the  case  of  gross  T(iown  where  d  has  multiple  export  neighbors,  if  hops(d,  i)  =  /•  for  an  AS  i  and 
f  ■  >  1  (note:  if  I,  <  0,  then  i  is  d),  then  the  time  taken  for  i  to  withdraw  its  route  is  0(1 ').  We  prove 
this  claim  by  induction  on  /  ■  as  follows: 

-  Base:  this  claim  holds  when  li  =  1,  since  an  AS  i  with  l(  being  1  must  be  an  export  neighbor 
of  d  and  i.d.ML  =  1,  which  means  that  actions  A 2  and  ,44  are  executed  with  i. suspect  being 
false  within  constant  time,  thus  0(1)  time. 

-  Hypothesis:  the  claim  holds  when  l(  <  l. 

-  Induction:  for  an  AS  i  with  l(  =  l  + 1,  each  of  its  import  neighbors  j  must  be  such  that  l'-  <  l  and 
j  withdraws  its  route  as  well  as  sends  out  an  withdrawal-UPDATE  message  within  0(1)  time. 

After  all  of  its  import  neighbors  send  out  their  withdrawal-UPDATE  messages,  i  will  execute 
actions  A3  and  A4  within  U,i  and  thus  0(1)  constant  time,  which  means  that  i  withdraws  its 
route  within  0(1)  +  0(1)  =  0(1  +  1)  time. 

Therefore,  the  claim  holds  when  l(  =  l  +  1. 

Thus,  G-BGP  converges  within  O(D^qo))  time  in  both  the  case  of  the  SPF  policy  and  the  case  of 
non-SPF  policy. 

More  tightly,  we  prove  that  G-BGP  converges  within  O ( max { hops ( d ,  i)+Dist(i)  :  action  ,45  is  executed  at  i } ) 
time  after  T,iovm  as  follows: 

-  For  any  AS  i  that  executes  action  ,45  and  thus  enters  the  process  of  resolving  uncertainty  between 
link  failure  and  node  failure,  it  withdraws  its  route  within  0(hops(d,i))  time.  This  claim  can 
be  proved  by  induction  on  i.d.M L  in  the  same  way  we  prove  that  the  time  taken  for  an  AS  j  to 
withdraw  its  route  is  0(1'-)  in  G-BGP  if  the  maximum  number  of  hops  in  any  simple  path  from 
d  to  j  in  Gp.d.qo  is  li.  For  clarity,  we  skip  the  proof  here. 

-  For  an  AS  j  where  action  A5  is  not  executed  during  G-BGP  convergence,  let  i  be  the  closest 
AS  to  j  that  has  executed  action  ,45,  and  let  dist(i,j )  be  the  number  of  hops  in  the  shortest 
path  from  i  to  j  in  Gp.d.qo-  Then,  j  will  withdraw  its  route  no  later  than  0(dist(i,  j))  time 
after  i  withdraws  its  route,  since  an  withdrawal-UPDATE  message  will  propagate  to  j  no  later 
than  d(dist(i,  j))  time  after  i  withdraws  its  route.  Therefore,  j  will  withdraw  its  route  within 
0(hops(d,i)  +  dist(i,j))  time  after  Tdown  occurs. 
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Therefore,  all  the  ASes  in  the  network  will  withdraw  their  routes  within  0(max.{hops(d,  i)+Dist(i)  : 
action  ,45  is  executed  at  i})  time  after  T(]own  occurs. 

On  the  other  hand,  instead  of  converging  within  9(1Z{i ,  V.qo,  go))  or  OCDiqo))  time,  BGP  is  proved  to 
take  up  to  9(£V(V.qo ,  go))  (thus  0(CV(y.qo,  go)))  time  to  converge  after  TdOWn  [IB]. 

By  the  definitions  of  lZ(i,  V.qo,  go),  T>(go),  and  £V(V.qo,  go),  it  is  straightforward  that  9(lZ(i,  V.qo,  go))  < 
O(V(q0)  <  £V(V.q0,q0). 


□ 


In  the  event  of  Tup,  we  have 

Theorem  6  (Convergence  speed  after  Tup)  When  the  SPF  route  ranking  policy  is  used,  G-BGP  as  well  as 
BGP  converges  to  a  stable  state  q'Q  within  V.q'0,  q'0))  time  in  the  event  ofTup,  which  is  asymptotically 

optimal. 

Proof :  [18]  and  [22]  have  proved  that  BGP  converges  from  the  initial  state  go  to  a  state  q'0  in  £  within 
9{1Z{i ,  V.tf .  g'))  time  in  the  case  of  the  SPF  policy.  This  convergence  result  of  BGP  also  applies  to  G-BGP 
since  G-BGP  behaves  the  same  as  BGP  except  that  G-BGP  is  more  conservative  when  changing  routes  such 
that  fault-agnostic  instability  can  be  reduced  in  G-BGP.  More  formally,  the  real-time  dispute  digraph  [22] 
for  a  network  G[V,  E.  P )  in  BGP  is  the  same  as  that  in  G-BGP,  therefore,  the  convergence  result  of  BGP 
also  applies  to  G-BGP. 

Moreover,  it  is  straightforward  that  the  lower  bound  on  convergence  time  in  the  event  of  Tup  is  9(T>(q'0)), 
since  it  takes  at  least  9(T>(q'0))  time  to  form  the  longest  route  used  by  a  node  after  convergence.  When  the 
SPF  route  ranking  policy  is  used,  equals  to  7 Z(i,  V.q'0,  q'0).  Therefore,  the  lower  bound  on  convergence 

time  is  also  9(lZ(i,V.q'0,q'0 ))  when  the  SPF  policy  is  used.  Thus,  G-BGP  and  BGP  arc  asymptotically 
optimal  in  convergence  time  in  the  event  of  Tup. 

□ 

In  the  event  of  Tchange,  not  every  node  needs  to  change  route.  A  node  is  affected  by  a  fault  /  if  the  node 
changes  route  at  least  once  during  convergence  after  /  occurs;  the  set  of  all  the  nodes  that  arc  affected  by  a 
fault  /  is  called  the  affectation  region  of  f.  Then,  we  have 

Lemma  5  (Minimized  affectation  region  for  the  SPF  policy)  When  the  SPF  route  ranking  policy  is  used, 
the  affectation  region  in  G-BGP  as  well  as  BGP  is  minimal  in  the  event  ofTchange. 

Proof :  We  analyze  the  affectation  region  in  both  G-BGP  and  BGP  under  different  fault  scenarios  when  the 
SPF  route  ranking  policy  is  used: 

•  In  the  case  of  an  AS  fail-stop,  a  link  fail-stop,  or  the  routing  policy  change  at  an  AS,  those  ASes  whose 
routes  go  through  the  fail-stopped  AS,  fail-stopped  link,  or  withdrawn  links  before  Tchange  have  to 
change  their  routes  since  the  AS  or  link(s)  is(are)  not  up  anymore.  Therefore,  these  ASes  compose 
the  minimum  affectation  set  MAS. 

The  route  of  every  AS  i  that  is  not  in  set  MAS  does  not  go  through  the  fail-stopped  AS,  fail-stopped 
link,  or  withdrawn  link(s)  before  Tchange.  i  will  not  change  its  route  after  Tchange  since  no  AS  in 
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MAS  can  offer  i  an  shorter  route  than  what  i  has  before  Tcyumge.  Therefore,  i  will  not  be  affected 
during  convergence  of  G-BGP  and  BGP. 

Therefore,  the  affectation  region  in  both  G-BGP  and  BGP  is  minimized  to  be  the  minimum  affectation 
set  in  the  case  of  the  SPF  policy. 

•  In  the  case  of  an  AS  or  a  link  join,  those  ASes  whose  routes  can  be  shortened  by  the  joining  of  the  new 
AS  or  link  have  to  change  their  routes,  and  these  ASes  compose  the  minimum  affectation  set  MAS. 
For  every  AS  i  whose  route  cannot  be  shortened  by  the  joining  of  the  new  AS  or  link  will  be  change 
its  route,  since  no  import  neighbor  of  i  can  offer  i  a  shorter  route.  Therefore,  the  affectation  region  in 
both  G-BGP  and  BGP  is  minimized  in  this  case  too. 


□ 


For  the  case  where  a  network  converges  from  a  state  go  to  another  state  gi,  we  define  the  following 
notations: 


AR(q0,qi) 

pt(k,q0,qi) 


Tri(k, X,  g0,  gi) 


the  set  of  nodes  in  V.qo  that  change  route  from  go  to  gi, 
i.e.,  {k  :  k  £  V.qo  A  k.AS-path.qo  f  k.AS-path.qi}; 

the  node  in  AR(qo ,  gi)  whose  AS  is  in  the  route  k.AS-path.qi  and  whose  next-hop 
does  not  change  route  from  go  to  gi; 

hops(pt(k ,  go,  gi),I,  go)  +  hops(k,pt(k ,  go,  qi).AS ,  gi)  for  a  node  k  in  AR(qo,  gi). 


Then,  we  have 

Theorem  7  (Convergence  speed  after  Triiange)  When  a  network  is  at  a  state  go  and  when  the  SPF  route 
ranking  policy  is  used, 

( i )  If  a  node  in  an  AS  X  or  a  link  associated  with  the  node  fail-stops,  or  if  the  routing  policies  ofX  change, 
G-BGP  converges  to  a  stable  state  q\  within  0(max.kev.qi/\keAR(qo,qi)  Tri(k,X,  go,  gi))  time,  which 
is  asymptotically  optimal;  it  takes  BGP  up  to  0(CP(AR(qo.  q\ ).  go))  time  to  converge  in  this  case, 
and  £P(AR(q0,  gi),  go)  >  ma ^keV.qi/\k&AR(q0,qi)  Tri(k,X ,  g0,  gi); 

( ii )  If  a  node  i  or  a  link  associated  with  i  joins,  G-BGP  as  well  as  BGP  converges  to  a  stable  state  q\ 

within  AR(qo ,  gi ) ,  gi ) )  time,  which  is  asymptotically  optimal. 

Proof:  To  analyze  the  convergence  time  of  G-BGP  and  BGP,  we  only  need  to  compute  the  longest  possible 
time  taken  for  an  AS  to  converge  to  its  new  state  in  C.  For  convenience,  we  let  kl  =  pt(k ,  go,  gi). 

•  When  the  cause  for  the  Trfiange  is  a  graceful  node  fail-stop,  a  gross  node  fail-stop  or  a  link  fail-stop 
where  the  fail-stopped  or  suspected  node  i  has  a  single  export  neighbor,  we  prove  the  claim  by  proving 
the  fact  that,  for  every  affected  AS  k,  it  takes  k  (HTrif  k.  X.  g0,  gi))  time  to  converge  to  its  new  state 
in  C.  We  achieve  this  by  induction  on  hops(k ,  k' .AS.  gi). 

-  Base:  if  hops{k,k' .AS,qi)  =  0  for  an  AS  k,  then  the  new  route  of  k  after  Tr}iange  does 
not  go  through  any  affected  AS  or  there  is  no  new  route  for  k.  Therefore,  the  time  taken 
for  k  to  converge  to  its  new  state  in  C  is  equal  to  the  time  taken  for  the  point  of  AS-failure 
(in  the  case  of  graceful  node  fail-stop),  the  point  of  channel-failure  or  the  point  of  segment- 
withdrawal  (in  the  case  of  gross  node  fail-stop  with  a  single  export-neighbor)  to  reach  k,  which 
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is  9(Tri(k,I,  qo,  qi ))  since  the  number  of  hops  between  i  and  k  is  6(Tri(k,I,  qo,  q\  ))  when 
hops(k,  k' .AS,  q\)  =  0,  and  there  is  no  uncertainty  to  resolve  and  thus  no  waiting  at  ASes 
between  i  and  k  in  the  route  of  k  before  Tchange.  Thus,  the  claim  holds  for  every  AS  k  with 
hops{k,k'  ,AS,q\)  =  0. 

-  Hypothesis:  the  claim  holds  for  every  AS  k  with  hops(k,  k! .AS,  q\ )  =  h  ( h  >  0). 

-  Induction:  for  every  k  with  hops{k,  k! .AS,  q\ )  =  h  + 1  ( h  >  0),  it  must  have  an  import  neighbor 
k"  with  hops(k,  k! .AS,  q\ )  =  h  +  1.  By  hypothesis,  k"  must  have  converged  to  its  new  state  in 
C  within  0(T ri(k" ,  I))  time.  Within  constant  time  after  k"  converges  to  its  new  state  in  C.  an 
UPDATE  message  with  the  attached  point  of  AS-failure  or  point  of  channel-failure  will  reach 
k  from  k",  at  which  point  k  learns  its  new  route  since  there  no  need  to  resolve  any  uncertainty 
in  the  case  of  graceful  AS  fail-stop  or  gross  fail-stop  with  a  single  export  neighbor.  Thus,  k 
converges  to  its  new  state  in  C  within  6(Tri{k”  ,1)  +  1)  time,  that  is,  0(Tri(k,Z,  qo,  q\ )  time 
since  Tri(k,Z ,  qo,  q\)  =  Tri(k" ,1)  +  1. 

When  the  SPF  route  ranking  policy  is  used,  the  minimum  time  required  for  a  node  j  to  converge  is  pro¬ 
portional  to  Tri(j).  Thus,  the  lower  bound  on  the  convergence  time  after  the  fault  is  9  (max j{T  ri(j)  : 
j  G  V.q\  A  j.AS-path.qi  /  j.AS-path.qo}).  Therefore,  G-BGP  converges  with  an  asymptotically 
optimal  speed. 

In  BGP,  when  an  AS  i  fail-stops,  the  set  of  ASes  whose  routes  go  through  i  have  to  change  their 
routes.  During  convergence,  fault-agnostic  instability  can  be  incurred  and  invalid  route  be  explored. 
Therefore,  in  the  worst  case  the  time  taken  for  BGP  to  converge  is  proportional  to  the  length  of  the 
longest  invalid  route  that  may  be  explored  [17,  18,  22],  In  the  case  of  Tchange,  the  length  of  the  longest 
invalid  that  may  be  explored  is  the  number  of  hops  CT({k  :  k  G  V.qo  A  k.AS-pat.h.qi  /  k.AS- 
path.qo},  qo)  in  the  longest  simple  path  in  the  subgraph  of  the  policy  graph  (after  faults)  on  the  set 
of  ASes  that  are  affected  (i.e.,  the  affectation  region).  Therefore,  it  takes  BGP  0 ( CP ( { k  :  k  G 
V.qo  A  k.AS-path.qi  /  k.AS-path.qo},  qo))  time  to  converge  after  Tchange  in  worst  cases  such  as 
when  every  affected  AS  has  no  route  to  reach  d  anymore  or  when  the  alternate  route  of  the  affected 
ASes  are  very  long.  Thus,  it  takes  BGP  0(CP({k  :  k  G  V.qo  Ak.  AS  -path.  q\  /  k.AS-path.qo},  qo)) 
time  to  converge  after  Tchange. 

•  When  the  cause  for  the  Tchange  is  the  routing  policy  change  at  an  AS  i,  there  are  two  cases:  i  removes 
some  of  its  import  neighbors  or  i  removes  some  of  its  export  neighbors.10 

If  i  removes  some  of  its  import  neighbors  and  the  policy  change  does  lead  to  an  event  Tchnnge,  then 
the  current  link  e  associated  with  i  that  i  uses  in  forwarding  traffic  to  d  is  withdrawn  by  i  too.  Then  a 
point  of  channel-withdrawal  with  sequence  number  of  i  is  propagated  out  from  i  to  every  other  ASes 
whose  route  go  through  i  or  e  before  Tchange.  The  convergence  behavior  of  G-BGP  in  this  case  is  the 
same  as  that  in  the  case  of  gross  AS  fail-stop  or  a  link  fail-stop  where  the  fail-stopped  or  suspected 
AS  has  a  single  export-neighbor. 

If  i  removes  some  of  its  export  neighbors  and  results  in  an  event  Triumrje-  then  a  set  of  points  of 
channel-withdrawal  for  every  link  withdrawn  will  be  propagated  out  from  i  reaches  every  AS  whose 


10Since  the  policy  is  fixed  in  the  case  of  the  SPF  policy,  we  do  not  consider  the  case  where  an  AS  changes  from  one  route  to 
another  route  of  equal  length. 
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route  goes  through  one  of  the  withdrawn  link  before  Tchange.  The  convergence  behavior  of  G-BGP  in 
this  case  is  the  same  as  that  in  the  case  of  graceful  AS  fail-stop. 

Similar  to  G-BGP,  the  convergence  behavior  of  BGP  in  the  case  of  routing  policy  change  at  an  AS 
is  the  same  as  that  in  the  case  of  graceful  AS  fail-stop  or  gross  AS  fail-stop  where  the  fail-stopped 
AS  i  has  a  single  export  neighbor.  Therefore,  it  takes  BGP  up  to  9 ( CP ( { k  :  k  €  V.qo  A  k.AS- 
path.qi  /  k.AS-path.qo },  qo))  time  to  converge  after  the  routing  policy  change  at  i. 

•  When  the  cause  for  the  Tchange  is  a  gross  node  fail-stop  or  a  link  fail-stop  where  the  fail-stopped  or 
suspected  node  i  has  multiple  export  neighbors,  the  convergence  behavior  of  G-BGP  differs  from  that 
in  the  case  of  gross  AS  fail-stop  or  link  fail-stop  where  the  fail-stopped  or  suspected  AS  only  a  single 
export  neighbor  in  the  sense  that  some  AS(s)  relatively  close  to  the  fail-stopped  or  suspected  may 
need  to  resolve  the  uncertainty  between  link-failure  and  node-failure  by  executing  action  Ah.  The 
uncertainty  resolution  procedure  can  introduce  delay  in  G-BGP  convergence  if  compared  to  the  opti¬ 
mal  achievable  performance,  even  though  the  extra  delay  is  small  because  the  uncertainty  is  resolved 
locally  around  the  fail-stopped  AS  or  link.  If  an  AS  j  executes  action  ,45  to  resolve  uncertainty, 
the  maximum  delay  j  can  introduce  to  G-BGP  convergence  is  0{hops{j ,  i.AS,  qo)  —  dist(i,j ,  qo)), 
where  dist(i,j ,  qo)  is  the  number  of  hops  in  the  shortest  path  from  i  to  j  in  the  policy  graph  Gp.qo- 
Since  no  two  ASes  where  one  AS  is  in  the  route  of  the  other  AS  before  or  after  Tchange  will  both 
execute  action  Ah  to  resolve  uncertainty  with  respect  to  the  liveness  of  an  suspected  AS,  and  two 
ASes  where  neither  is  in  the  route  of  the  other  before  or  after  Tchange  execute  uncertainty  resolu¬ 
tion  procedure  in  parallel,  the  overall  maximum  delay  that  can  be  introduced  to  G-BGP  convergence 
is  0(m.ax.j{hops(j,i.AS,  qo)  —  dist(i,j,qo)  :  Ah  is  executed  at  j}).  Thus,  G-BGP  converges 
within  0(m&Xk{Tri(k,T,qo,qi)  :  k  is  affected}  +  maxj{hops(J,i.AS,qo)  —  dist(i,j,qo)  : 
action  Ah  is  executed  at  i})  time.  Since  hops(j,  i.AS,  qo)  —  dist{i,j,  qo)  <  Tri(j)  for  every  node 
j  who  executed  action  Ah,  G-BGP  converges  within  0(max*.{TW(&;,Z,  qo,  qi)  :  k  is  af  fected}) 
time,  which  has  been  shown  to  be  asymptotically  optimal. 

In  BGP,  the  probability  that  fault-agnostic  instability  occur  in  the  case  of  a  gross  AS  fail-stop  or  a 
link  fail-stop  where  the  fail-stopped  or  suspected  AS  i  has  multiple  export  neighbors  is  much  higher 
than  that  in  the  case  of  a  gross  AS  fail-stop  or  a  link  fail-stop  where  the  fail-stopped  or  suspected 
AS  i  has  a  single  export  neighbor.  Therefore,  it  can  take  BGP  up  to  ()(CP(  {k  :  k  G  V.qo  A  k.AS- 
path.qi  7^  k.AS-path.qo},  qo))  time  to  converge  after  a  gross  AS  fail-stop  or  a  link  fail-stop  where 
the  fail-stopped  or  suspected  AS  i  has  multiple  export  neighbors. 

•  When  the  cause  for  the  Tchange  is  an  AS  or  a  link  join,  the  set  of  ASes  whose  distance  can  be  reduced 
by  the  joining  of  the  new  AS  or  link  arc  affected,  and  only  these  ASes  arc  affected  in  the  case  of  the 
SPF  policy.  Therefore,  the  time  taken  for  G-BGP  and  BGP  to  converge  equals  to  the  maximum  time 
taken  for  an  affected  AS  (i.e.,  some  of  whose  nodes  change  route  when  the  network  converges  from 
qo  to  qi)  to  change  its  route  to  a  shorter  one  which  goes  through  the  newly  joined  AS  or  link.  The 
convergence  behavior  of  G-BGP  and  BGP  in  this  case  is  the  same  as  that  in  the  case  of  Tup  where  the 
destination  is  the  newly  joined  AS  i  or  the  endpoint  of  the  joining  link  that  is  in  the  route  of  the  other 
endpoint,  and  the  ASes  in  the  network  is  the  set  of  affected  ASes.  By  Theorem  6,  it  takes  G-BGP  and 
BGP  6(K(i,  V.qo,  qo)i)  time  to  converge,  which  is  asymptotically  optimal. 


□ 
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By  Theorems  5,  6,  and  7,  we  see  that  G-BGP  either  achieves  asymptotically  optimal  convergence  speed 
or  asymptotically  improves  the  convergence  speed  of  BGP  in  several  scenarios  where  BGP  exhibits  delayed 
convergence  (such  as  when  a  node  or  a  link  fail-stops).  By  Theorem  3,  Lemma  5,  and  Theorem  7,  we 
observe  that,  when  a  node  or  a  link  fail-stops  or  when  an  AS  changes  routing  policies,  fault-agnostic  insta¬ 
bility  prevents  BGP  from  converging  at  an  asymptotically  optimal  speed  (as  does  G-BGP),  even  though  the 
affectation  region  is  also  minimal  in  BGP  when  the  SPF  route  ranking  policy  is  used. 

On  the  other  hand,  when  the  SPF  policy  is  used  (as  is  the  case  in  most  ASes  in  the  Internet),  BGP 
converges  at  an  asymptotically  optimal  speed  when  a  node  or  a  link  joins.  This  conforms  with  our  simulation 
results  (as  shown  in  Section  6)  and  the  experimental  observations  [17,  18,  20]  that  BGP  does  not  experience 
much  delay  in  convergence  when  a  node  or  a  link  joins. 

We  summarize  the  convergence  speed  of  G-BGP  and  BGP  in  Table  2. 


Speed 

the  SPF  Policy 

Non-SPF  Policy 

Tdown 

gross  TdoWn  with  one  ex¬ 
neighbor,  graceful  Tdown 

G-BGP 

6{lZ(i ,  E.go,  go)),  optimal 

same  as  left 

BGP 

O(CV(V.q0,q0)) 

same  as  left 

gross  Tdown  with 
multiple  ex-neighbors 

G-BGP 

O(D(q0 )) 

same  as  left 

BGP 

O(CV(V.q0,q0)) 

same  as  left 

T 

A  up 

G-BGP 

6{JZ{i ,  V.q'0l  g^)),  optimal 

BGP 

6{lZ(i ,  V.q'0,  g^)),  optimal 

^change 

node  or  link  fail-stop, 
policy  change 

G-BGP 

6(m&x.k{Tri(k,l,  go,  gi)  :  k  is  affected}) 
optimal 

BGP 

O(CV(AR(q0,qi),q0 )) 

node  or  link  join 

G-BGP 

9(lZ(i ,  AR(qo ,  gi),  gi)),  optimal 

BGP 

6{lZ(i ,  AR(qo ,  gi),  gi)),  optimal 

Table  2:  Convergence  speed  of  G-BGP  and  BGP.  In  the  table,  lZ(i,  V.q'0,  q'0)  <  /7(  ry0  j  <  CP(V.qo,  qo), 
and  maxk{Tri(k,l,  go ,  gi)  :  k  is  affected}  <  CP(AR(qo,  qi),  go);  optimal  in  a  box  means  that  optimal 
convergence  speed  is  achieved  in  G-BGP  or  BGP  in  the  corresponding  scenario. 


6  Simulation  results 

We  implement  G-BGP  in  SSFNet  [1],  a  network  simulator  which  has  implemented  a  variety  of  standard 
Internet  protocols  such  as  BGP,  OSPF,  and  TCP.  For  fidelity  of  simulation,  we  use  realistic  Internet-type 
topologies  [1]  to  evaluate  the  convergence  properties  of  G-BGP.  To  study  the  impact  of  network  size  as  well 
as  route  ranking  policy,  we  use  networks  of  size  ranging  from  7  ASes  to  1 15  ASes,  and  we  use  both  the  SPF 
route  ranking  policy  and  a  randomized  non-SPF  policy  where  every  route  r  is  assigned  a  random  rank(r). 
Then,  we  inject  various  types  of  faults  (i.e.,  node  fail-stop,  node  join,  and  policy  change11)  into  networks  to 
simulate  the  events  of  TdoWn,  Tup,  and  Tchange.  The  simulation  results  are  as  follows. 

Event  Tjown-  When  the  destination  d  fail-stops,  the  number  of  unnecessary  route  changes  during  con¬ 
vergence  and  the  convergence  time  of  G-BGP  as  well  as  BGP  arc  shown  in  Figures  8  and  9  respectively. 


1  'The  impact  of  link  fail-stop  and  link  join  is  reflected  via  node  fail-stop  and  node  join  respectively.  Thus  we  do  not  simulate 
link  fail-stop  or  link  join. 
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Network  size  (#  of  ASs) 


Network  size  (#  of  ASs) 


(a)  SPF  policy 


(b)  Randomized  non-SPF  pol¬ 
icy 


Figure  8:  The  number  of  unnecessary  route  changes  after  the  destination  d  fail-stops 
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Figure  9:  The  convergence  time  after  the  destination  d  fail-stops 


We  see  that  G-BGP  converges  with  no  unnecessary  route  changes,  as  proved  in  Corollary  ??.  But 
there  arc  many  unnecessary  route  changes  during  BGP  convergence,  and  the  number  increases  quickly  as 
the  network  size  increases.  If  we  measure  convergence  stability  by  the  number  of  route  changes  during 
convergence,  G-BGP  improves  BGP  convergence  stability  by  a  factor  of  29.4  for  the  network  with  115 
ASes.  We  also  observe  that,  as  network  size  increases,  the  convergence  time  of  G-BGP  barely  increases, 
but  the  convergence  time  of  BGP  increases  quickly.  For  the  network  with  115  ASes,  G-BGP  reduces  the 
convergence  time  of  BGP  by  a  factor  of  10.2. 

An  interesting  observation  is  that,  in  cases  where  the  network  size  and  the  convergence  time  of  BGP 
increase  (e.g.,  when  the  network  size  increases  from  85  ASes  to  115  ASes),  the  convergence  time  of  G-BGP 
may  even  decrease.  The  reason  for  this  is  that,  as  network  size  increases,  the  connectivity  may  increase, 
which  reduces  the  average  distance  between  nodes  and  thus  the  G-BGP  convergence  time.  This  is  in  con¬ 
trast  to  BGP  where,  as  network  connectivity  increases,  the  probability  of  using  invalid  routes  and  thus  the 
convergence  time  increase. 
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Event  Tup.  When  the  destination  d  joins,  the  number  of  unnecessary  route  changes  during  convergence 
and  the  convergence  time  of  G-BGP  as  well  as  BGP  are  shown  in  Figures  10  and  11  respectively. 


Network  size  (#  of  ASs) 


(a)  SPF  policy 


(b)  Randomized  non-SPF  pol¬ 
icy 


Figure  10:  The  number  of  unnecessary  route  changes  after  the  destination  d  joins 


(a)  SPF  policy 


(b)  Randomized  non-SPF  pol¬ 
icy 


Figure  1 1 :  The  convergence  time  after  the  destination  d  joins 


We  see  that,  when  the  SPF  policy  is  used,  the  number  of  unnecessary  route  changes  during  convergence 
and  the  convergence  time  of  BGP  are  the  same  as  those  of  G-BGP,  which  is  not  unexpected  since,  as  proved 
in  Theorems  3  and  6,  there  is  no  fault-agnostic  instability  during  BGP  convergence  and  the  convergence 
speed  of  BGP  is  asymptotically  optimal  in  this  case.  On  the  other  hand,  when  the  randomized  non-SPF 
policy  is  used,  the  number  of  unnecessary  route  changes  during  convergence  and  the  convergence  time  of 
BGP  are  greater  than  those  of  G-BGP. 

We  also  observe  unnecessary  route  changes  during  G-BGP  convergence,  which  is  due  to  distribution- 
inherent  instability.  However,  the  time  taken  for  G-BGP  to  converge  is  still  quite  short  in  spite  of  the 
distribution-inherent  instability,  which  is  similar  to  the  observation  in  [20]  that  distribution-inherent  insta¬ 
bility  does  not  cause  long  delay  in  BGP  convergence. 

Event  Tchange  when  a  node  fail-stops.  When  a  non-destination  node  fail-stops,  the  number  of  unneces¬ 
sary  route  changes  during  convergence  and  the  convergence  time  of  G-BGP  as  well  as  BGP  are  shown  in 
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Figures  12  and  13  respectively. 
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(b)  Randomized  non-SPF  pol¬ 
icy 


Figure  12:  The  number  of  unnecessary  route  changes  after  a  non-destination  node  fail-stops 


(a)  SPF  policy 


(b)  Randomized  non-SPF  pol¬ 
icy 


Figure  13:  The  convergence  time  after  a  non-destination  node  fail-stops 


In  this  case,  the  patterns  of  difference  in  convergence  stability  as  well  as  speed  between  G-BGP  and 
BGP  arc  similar  to  those  in  the  event  of  T'down- 

Event  Tchange  when  a  node  joins.  When  a  non-destination  node  joins,  the  number  of  unnecessary  route 
changes  during  convergence  and  the  convergence  time  of  G-BGP  as  well  as  BGP  are  shown  in  Figures  14 
and  15  respectively. 

In  this  case,  the  patterns  of  difference  in  convergence  stability  as  well  as  speed  between  G-BGP  and 
BGP  arc  similar  to  those  in  the  event  of  Tup,  and  the  results  conform  with  Theorem  7. 

Event  Tchange  when  an  AS  changes  routing  policy.  An  AS  may  change  its  route  ranking  policy,  import 
policy,  and  export  policy.  However,  the  effect  of  changing  any  of  these  policies  is  similar  to  each  other, 
i.e.,  some  node  changes  route.  Therefore,  we  only  simulate  the  case  where  an  AS  changes  its  export  policy. 
When  an  AS  changes  its  export  policy,  the  number  of  unnecessary  route  changes  during  convergence  and 
the  convergence  time  of  G-BGP  as  well  as  BGP  are  shown  in  Figures  16  and  17  respectively. 
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Figure  14:  The  number  of  unnecessary  route  changes  after  a  non-destination  node  joins 


(b)  Randomized  non-SPF  pol¬ 
icy 


(a)  SPF  policy 


Figure  15:  The  convergence  time  after  anon-destination  node  joins 


(a)  SPF  policy 


(b)  Randomized  non-SPF  pol¬ 
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Figure  16:  The  number  of  unnecessary  route  changes  after  an  AS  changes  export  policy 
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(a)  SPF  policy 


(b)  Randomized  non-SPF  pol¬ 
icy 


Figure  17:  The  convergence  time  after  an  AS  changes  export  policy 


We  see  that  the  patterns  of  difference  in  convergence  stability  as  well  as  speed  between  G-BGP  and  BGP 
arc  similar  to  those  in  the  case  when  a  non-destination  node  fail-stops. 

7  Discussion 

In  this  section,  we  discuss  implementation  and  deployment  issues  for  G-BGP  We  also  discuss  approaches 
to  reducing  distribution-inherent  instability. 

Enhancing  intra-AS  coordination.  G-BGP  enhances  the  intra-AS  coordination  in  BGP,  so  that  each  node 
i  informs  the  other  nodes  in  its  AS  of  the  route  of  i  itself,  the  neighboring  ASes  to  which  i  has  exported  its 
route,  and  the  neighboring  ASes  to  which  i  is  connected  via  an  up-link.  It  is  straightforward  to  implement 
this  technique  if  the  basic  BGP  [24]  is  used,  because  all  the  nodes  in  an  AS  maintain  IBGP  sessions  with  each 
other.  On  the  other  hand,  if  route  reflection  [5]  or  AS  confederation  [21]  is  used,  nodes  in  an  AS  may  not 
maintain  IBGP  sessions  with  each  other.  To  enable  enhanced  intra-AS  coordination  in  the  latter  case,  G-BGP 
requires  that,  (i)  when  route  reflection  is  used,  a  route  reflector  provide  the  required  information  regarding 
nodes  within  its  cluster  to  nodes  outside  its  cluster,  and  that,  (ii)  when  AS  confederation  is  used,  a  node  i 
having  a  BGP  session  with  some  node  in  a  neighboring  member- AS  provide  the  information  regarding  nodes 
in  the  member- AS  of  i  itself.  (Interestingly,  it  has  also  been  proven  that  letting  route  reflectors  expose  more 
detailed  information  about  nodes  within  their  clusters  solves  the  problem  of  persistent  route  oscillations 
caused  by  certain  “route  reflection”  configurations  [4].) 

G-BGP  in  the  presence  of  AS  partition.  The  nodes  in  an  AS  are  usually  connected.  However,  it  is 
possible  (though  rare)  that  an  AS  is  partitioned  due  to  some  severe  faults,  in  which  case  nodes  within  the  AS 
cannot  maintain  a  consistent  view  of  routing.  However,  a  consistent  view  of  routing  among  nodes  within  the 
same  AS  is  required  in  G-BGP  for  the  task  of  generating  certain  fault  information  (i.e.,  a  point  of  channel- 
withdrawal,  a  point  of  segment-withdrawal,  or  a  point  of  channel-failure),  as  well  as  the  task  of  assigning 
sequence  numbers  to  fault  information. 

To  guarantee  the  correctness  of  G-BGP  in  the  presence  of  AS  partition,  G-BGP  can  be  adapted  as 
follows:  First,  whenever  a  node  i  in  a  partitioned  AS  would  generate  a  point  of  channel-withdrawal,  a  point 
of  segment-withdrawal,  or  a  point  of  channel-failure  regarding  a  channel  ( i.AS,J )  under  normal  G-BGP 
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operation,  i  generates  a  point  of  segment-withdrawal  (0,  §',  i.AS,  J ,  i,  t }  instead,  where  S'  is  the  set  of  ASes 
to  which  i  has  exported  its  last  route;  Second,  whenever  i  would  generate  a  sequence  number  under  normal 
G-BGP  operation,  i  also  attaches  its  node-id  (e.g.,  BGP  identifier)  to  signal  the  fact  that  the  freshness  of  the 
corresponding  fault  information  should  be  verified  on  the  basis  of  node  i  instead  of  its  AS  i.AS. 

Encoding  fault  information.  Besides  the  information  used  by  BGP,  G-BGP  uses  the  following  fault 
information:  point  of  channel-withdrawal  POCW ,  point  of  segment-withdrawal  POSW ,  point  of  AS- 
failure  POAS,  point  of  channel-failure  POCF,  and  point  of  node -join  PON  J.  Therefore,  to  implement  G- 
BGP  in  a  way  that  allows  graceful  migration  of  and  interoperability  with  BGP,  one  key  issue  is  to  incorporate 
fault  information  into  the  existing  BGP  message  format  such  that  G-BGP  and  BGP  can  inter-operate. 

In  BGP  [24],  an  UPDATE  message  has  a  variable-length  field  Path  Attributes  with  a  maximum  length 
of  65,535  bytes.  The  Path  Attributes  field  consists  of  a  sequence  of  path  attributes,  such  as  AS_PATH.  Each 
path  attribute  is  a  3-tuple  <attribute  type ,  attribute  length ,  attribute  value>  of  variable  length.  Attribute 
Type  is  a  two-octet  field  that  consists  of  the  Attribute  Flags  octet  followed  by  the  Attribute  Type  Code  octet, 
where  Attribute  Flags  determine  whether  an  attribute  is  optional  or  well-known  and  whether  it  is  transitive  or 
non-transitive.  An  attribute  is  optional  if  it  is  not  required  to  be  recognized  by  every  router,  and  an  attribute 
is  transitive  if  it  needs  to  be  propagated  by  every  router  no  matter  whether  the  router  recognizes  the  attribute. 

We  incorporate  the  POCW,  POSW ,  POAS,  POCF,  and  PON  J  values  into  the  UPDATE  messages 
of  BGP  by  defining  a  new  optional  transitive  path  attribute  FAULT .POINTS  with  type  code  8.  The  Attribute 
Value  for  FAULT  JOINTS  consists  of  a  sequence  of  fault  information  rn  whose  format  depends  on  the  type 
of  fault  it  conveys: 

•  If  m  is  a  point  of  channel-withdrawal,  then  it  is  a  7-octet  field  with  the  first  octet  being  2,  the  second 
and  third  octets  being  the  ID  of  the  AS  that  is  one  endpoint  of  the  withdrawn  link,  the  fourth  and  fifth 
octets  being  the  ID  of  the  AS  that  is  the  other  endpoint  of  the  withdrawn  link,  and  the  remaining  two 
octets  being  the  sequence  number,  i.e.,  m  =<  0,  AS-id,  AS-id ,  sn  >; 

•  If  rn  is  a  point  of  segment-withdrawal,  then  it  is  a  variable-length  field  with  the  first  octet  being  1  and 
the  rest  being  (Withdrawn- ASes,  Suspected- ASes,  AS-id,  AS-id,  BGP-id,  t,  sn).  The  fields  of 
Suspected-ASes  and  Suspected- ASes  are  two  variable-length  fields  each  of  which  has  two  sub¬ 
fields  ( length ,  data)  where  length  is  a  1 -octet  field  specifying  the  length  of  data  in  octets  and  data 
is  a  sequence  of  2-octets  for  the  IDs  of  the  corresponding  ASes;  AS-id  is  a  2-octet  field;  BGP-id  is 
a  4-octet  field  denoting  the  BGP-identifier  of  the  node  that  generates  the  information;  t  is  a  4-octet 
field  denoting  the  time  in  microseconds  that  has  passed  since  the  information  is  generated;  and  sn 
denotes  the  sequence  number  of  the  AS  that  first  sends  out  this  point  of  segment- withdrawal,  i.e., 
m  =<  1,  ( length ,  ( AS-id)+ ),  ( length ,  (AS-id)+) ,  AS-id,  AS-id,  BGP-id,  t,  sn  >. 

•  If  m  is  a  point  of  AS-failure,  then  it  is  a  5-octet  field  with  the  first  octet  being  0,  the  second  and  third 
octets  being  the  ID  of  the  AS  that  has  fail-stopped,  and  the  remaining  two  octets  being  the  sequence 
number,  i.e.,  m  =<  2,  AS-id,  sn  >; 

•  If  rn  is  a  point  of  channel-failure,  then  it  is  a  1 1 -octet  field  with  the  first  octet  being  1,  the  second  and 
third  octets  being  the  ID  of  the  AS  that  is  suspected,  the  fourth  and  fifth  octets  being  the  ID  of  the 
AS  that  detects  the  link  fail-stop,  the  following  four  octets  being  the  time  in  microseconds  that  has 
passed  since  the  detection  of  the  link  fail-stop,  and  the  last  two  octets  being  the  sequence  number,  i.e., 

m  =<  3,  AS-id,  AS-id,  t,  sn  >; 
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•  If  m  is  a  point  of  node -join,  then  it  is  a  variable-length  field  with  the  first  octet  being  4  and  the 
rest  being  (Suspected- AS es,  BGP-id ,  t,  sn).  Suspected- AS es  is  a  variable-length  field  with  two 
subfields  ( length ,  data)  where  length  is  a  1-octet  field  specifying  the  length  of  data  in  octets  and 
data  is  a  sequence  of  2-octets  for  the  IDs  of  the  suspected  ASes;  BGP-id  is  a  4-octet  field;  t  is  a 
4-octet  field  denoting  the  time  in  microseconds  that  has  passed  since  the  detection  of  the  AS-join;  and 
sn  denotes  the  sequence  number  of  the  AS  that  first  sends  out  this  point  of  node -join,  i.e.,  m  =< 
4,  ( length ,  ( AS-id)  + ),  BGP-id ,  t ,  sn  >. 

(Remark:  purging-messages  and  state-clarifiers  used  in  uncertainty-resolution  are  incorporated,  in  a 
similar  way,  into  BGP  UPDATE  messages  as  two  optional  transitive  attributes.) 

Incremental  deployment  of  G-BGP.  Given  that  G-BGP  uses  an  optional  transitive  path  attribute  to  carry 
fault  information,  G-BGP  can  be  incrementally  deployed  and  inter-operate  well  with  BGP  Moreover,  even 
in  the  case  of  parti al  deployment,  the  improvement  in  convergence  stability  and  speed  is  guaranteed  for 
those  ASes  that  deploy  G-BGP:  when  a  fault  occurs,  information  about  the  fault  will  be  generated  at  some 
node  that  deploys  G-BGP  and  is  affected  by  the  fault;  then  the  fault  information  is  propagated,  along  with 
BGP  UPDATE  messages,  to  other  affected  nodes;  when  the  fault  information  reaches  a  node  that  deploys 
G-BGP,  the  node  can  use  the  fault  information  to  avoid  fault-agnostic  instability  and  to  expedite  the  network 
convergence. 

Approaches  to  reducing  distribution-inherent  instability.  Even  though  distribution-inherent  instability 
does  not  cause  much  delay  in  BGP  convergence,  it  may  enlarge  the  affectation  regions  of  faults  when  non- 
SPF  route  ranking  policies  arc  used.  As  a  result,  some  nodes  arc  affected,  even  if  they  do  not  have  to  change 
routes  in  the  presence  of  faults.  Therefore,  the  time  taken  for  G-BGP  and  BGP  to  converge  is  increased  by 
an  amount  depending  on  the  number  of  such  nodes.  One  way  to  ameliorate  this  issue  of  enlarged  affectation 
region  is  to  use  the  technique  of  local  stabilization  [3],  which  contains  the  impact  of  distribution-inherent 
instability  locally  around  where  it  occurs,  so  that  the  affectation  region  is  bounded  in  diameter  (only  as  a 
function  of  the  degree  of  fault  perturbation  in  a  network). 

Moreover,  to  reduce  type-(i)  distribution-inherent  instability,  one  approach  is  to  reduce  the  delay  in  infor¬ 
mation  sharing  by  propagating  fault  information  faster;  another  approach  is  for  nodes  to  wait  conservatively 
before  changing  routes,  in  hope  that  fresher  information  will  arrive. 

8  Concluding  remarks 

The  stability  and  speed  of  BGP  convergence  are  closely  related.  To  expedite  BGP  convergence  and  to  avoid 
mis-interaction  between  convergence  instability  and  instability-suppression  mechanisms  (such  as  route-flap- 
damping),  we  studied  the  nature  of  instability  during  BGP  convergence,  and  we  classified  the  instability  into 
two  categories:  fault-agnostic  instability  and  distribution-inherent  instability.  Distribution-inherent  instabil¬ 
ity  does  not  cause  severe  delay  in  BGP  convergence  and  provably  exists  in  every  distributed  routing  protocol. 
Therefore,  we  focused  on  mechanisms  to  eliminate  fault- agnostic  instability;  and  we  proved  that  the  elim¬ 
ination  of  fault-agnostic  instability  enables  G-BGP  to  asymptotically  improve  BGP  convergence  speed  and 
to  converge  at  an  asymptotically  optimal  speed  in  several  common  scenarios  where  BGP  convergence  is 
severely  delayed  (such  as  when  a  node  or  a  link  fail-stops). 

In  G-BGP,  fault-agnostic  instability  is  removed  by  rejecting  invalid  routes  and  obsolete  fault  information. 
And  this  is  enabled  by  (i)  propagating  necessary  fault  information  to  the  affected  nodes,  (ii)  enforcing  a  total 
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order  on  fault  information  regarding  the  same  AS,  and  (iii)  resolving  uncertainty  as  to  the  state  of  other 
ASes.  In  general,  we  believe  that  propagating  information  about  network  dynamics  (such  as  faults)  and 
using  better  state  detection  techniques  can  help  the  affected  nodes  adapt  their  behaviors  during  convergence, 
which  is  also  feasible  given  today’s  high  speed  networks. 

The  philosophy  of  “information  hiding”  in  hierarchical  structures  is  observed  in  G-BGP  in  the  sense  that 
it  does  not  expose  extra  information  at  the  intra-AS  level  to  the  inter-AS  level.  G-BGP  does  not  introduce 
additional  information  that  needs  to  be  maintained  (unboundedly  in  time)  between  far  away  nodes,  thus 
G-BGP  does  not  introduce  extra  instability  in  the  presence  of  network  dynamics.  In  general,  “information 
hiding”  helps  contain  the  impact  of  system  dynamics  locally  around  where  the  dynamics  occur,  and  to 
guarantee  system  stability,  “information  hiding”  should  be  observed  as  a  principle  when  we  design  new 
protocols  or  migrate  existing  protocols  [3]. 

We  mainly  focused  on  the  issues  related  to  fault-agnostic  instability  in  this  paper.  In  our  future  work, 
we  will  study  in  more  detail  the  impact  of  distribution-inherent  instability  on  BGP  convergence  speed;  we 
will  also  study  the  fundamental  limits  on  approaches  to  reducing  distribution-inherent  instability. 
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